8 Replies Latest reply on Feb 3, 2019 12:27 PM by Tom Szelagowski

    Multiple primary keys

    Tom Szelagowski

      Hi Everyone

       

      My first post here.

       

      I wonder how tableau can handle multiple primary keys. I am getting regular updates to a native database of service users. The primary key is the person ID - and it is unique except that in some cases more than one entry is created per user. We have yet to find out how to avoid that but till then, I would hope to find out how to minimise the resulting issues.

       

      Given that the database is at a high level of 'entropy', on a number of occasions we end up counting the cases that belong to a category, only later to find out we have under- or over-counted.

       

      Thanks

      Tom

        • 1. Re: Multiple primary keys
          Joe Oppelt

          If you put PersonID on your sheet and then drag a measure to display, Tableau does SUM([that measure]) by default.  So if you have multiple rows for that PersonID, you'll get a sum of the values for that measure for that person.


          Sometimes you want this.  Sometimes not.  If the values in the rows for that multi-row PersonID are all the same and you only want one "count" of it, you can change the aggregation to AVG([that measure]) or MIN or MAX.

           

          And if you are counting PersonIDs. you can use the COUNTD function rather tan COUNT.  The "D" stands for DISTINCT, and will give a count of 1 for multiple occurrences of the item you are counting.

          • 2. Re: Multiple primary keys
            Tom Szelagowski

            Thank you for your answer.

             

            indeed, when the data is numerical, that's the way to go. Forgot  to add we are using dates a lot to keep track of in- and outflow of patients - and this is where problems start.


            For example, if a date of completion of a health review does show in one entry but does not in the other, when compiling waiting lists, we end up over-counting. I am not sure if this gives you an idea of the range of possible problems, but combined with commonplace redefining events, with deep data cleansing and with tight deadlines, it boils down to quite a variety of errors.

             

            An obvious solution to the above example of over-counting is to list cases rather than to aggregate them, and to cross-check the outputs. At the end of the day, you also need to keep special lists to be joined, and to remember the names just in case.

             

            Far from the chance to make full use of tableau and how it automates data handling, let alone producing consistent packs for the national data warehouse.

             

            Naturally, we are working hard at many levels to get this rectified, however, until we have succeeded, which is probably still a matter of months, I would greatly appreciate some tips on how to tame such a data set.

            • 3. Re: Multiple primary keys
              Joe Oppelt

              I think it's time to provide an anonymized workbook with anonymized data so that I can stop guessing and answer specifically.

               

              Follow the steps in the video at this link:

               

              Video demonstrates how to anonymize your workbook/data

               

              I did an example in the video, and it took about 10 minutes.

               

              I don't need ALL your data.  Just enough to demonstrate the problem you are facing.  Maybe just two patients with records representative of the problem.  And if you anonymize names and other personal info, you should be able to share a sample workbook here.  In fact, you can toss out most of the columns that don't really pertain to the issue you're having.  Just fake IDs, rows with dates (and nulls), and any other columns that are needed to demo the issue.

              • 4. Re: Multiple primary keys
                Tom Szelagowski

                Fair enough, please see attached file. I hope I have not overanonymised it

                 

                If you look at the data  source

                 

                there are three patients but one of them has two records.

                 

                As you can see in the tableau, it properly counts

                the total number: 3

                the number of cases started: 2

                the number of completed: 1

                 

                but it overcounts the started but not completed.

                 

                Clearly, it disregards the fact that patient 1 has another record that confirms the completion.

                 

                This is the root cause of the problems we are getting.

                 

                Thank you for your help. 

                • 5. Re: Multiple primary keys
                  Joe Oppelt

                  (V 10.5 here)

                   

                  When you have multiple rows per ID, you can yank out a single value from among all the rows using LOD calcs.

                   

                  In the attached on Sheet 5 we see the duplicate PID for "1".

                   

                  On Sheet 5(2), I have made two FIXED LODs.  The syntax says to get a single value for each PID for the specified date.  In this case, for PID = 1, even though one of the rows does not have an actual eligibility date in the data source, this calc sets the eligibility date for all rows for that PID.  Using this calc field instead of the actual data source field will let you do better counting.

                   

                  LODs were introduced in version 9.0, and it's something that intimidates a lot of users.  But it's a powerful feature that is worth the effort to master.

                  • 6. Re: Multiple primary keys
                    Tom Szelagowski

                    That looks really amazing, thank you.

                     

                    Off topic, I think I have today tested a brute-force equivalent method on a different data set.

                     

                    window_max ((

                    if ATTR(right ([Target], 1))='%' then attr([Target]) else (str(sum(

                    if (right ([Target], 1))='%' then 0 else (int([Target])) END

                    )))END), -2, 0)

                    In essence after having made target figures coming as  text get aggregated at LOD quarter-of-the-year up, I found out that quarters  consisting of  one month's value are put in a separate row, which produced two values rather  than one. The wrapping of the above code applies max on aggregation, an action which tableau normally rejects. This would not be replaced by a fixed LOD command, would it?

                     

                    Thanks a lot for your priceless help. I will test the method before approving.

                    • 7. Re: Multiple primary keys
                      Joe Oppelt

                      ...The wrapping of the above code applies max on aggregation, an action which tableau normally rejects. This would not be replaced by a fixed LOD command, would it?

                       

                       

                      The code you gave uses WINDOW_max.  WINDOW_xxx functions (which are table calcs) are what you use to further aggregate aggregates.

                       

                      In some ways, (in many ways), LODs replace table calcs, but they can't always do so.  And once you have table calcs in the mix, you preclude the use of LODs.

                       

                      But I think you can do what you've done above with LODs.  I'd have to dig into it, and I'm short on time today.

                      • 8. Re: Multiple primary keys
                        Tom Szelagowski

                        OK, I will test fixed LOD on that as well.

                         

                        On the multiple rows per ID it worked perfectly. Many thanks for your help.