5 Replies Latest reply on May 15, 2011 6:48 PM by Joe Mako

    Counting unique values

    Laura Macfarlane

      Hi.

       

      I'm working with a relational database where there often multiple rows for a single entry in the database. I can filter the data using a unique ID number (Count Distinct IDNumber) but have run into difficulty with the following:

       

      I've created a measure calculating the difference between two dates (DATEDIFF('day',[Date1],[Date2]). I want to determine the average difference between those dates for several different categories. What I've done calculates the average for the difference in the two dates for all rows for each category. Some entries are counted more than once as they appear as multple rows - thus my average is not correct.

       

      I can't work out how to determine the average of my calculation (average of DATEDIFF('day',[Date1],[Date2])) for unique ID values only.

       

      I hope this makes sense.

       

      Thanks

        • 1. Re: Counting unique values
          Joe Mako

          If you have data like this:

           

           

          Category,ID Number,Date 1,Date 2

          Category A,1,1/1/2010,1/12/2010

          Category A,1,1/1/2010,1/12/2010

          Category A,2,2/1/2010,2/2/2010

          Category A,3,3/1/2010,3/2/2010

          Category B,4,4/1/2010,4/15/2010

          Category B,4,4/1/2010,4/15/2010

          Category B,4,4/1/2010,4/15/2010

          Category B,5,5/1/2010,5/2/2010



           

          Where rows are essescentially duplicated. There could be other other columns that make each row (combination of all field) unique, but these are the fields you want to look at and an example of the situation.

           

          I know of two options:

           

          Option 1 - Custom SQL

           

          If you connect to the data again, but use custom SQL for the connection type, you could change the Tableau made SQL that looks something like this:

           

           

          SELECT [data#tab].[Category] AS [Category],

            [data#tab].[Date 1] AS [Date 1],

            [data#tab].[Date 2] AS [Date 2],

            [data#tab].[ID Number] AS [ID Number],

            [data#tab].[Other Field 1] AS [Other Field 1],

            [data#tab].[Other Field 2] AS [Other Field 2],

            [data#tab].[Other Field 3] AS [Other Field 3],

            [data#tab].[Other Field 4] AS [Other Field 4]

          FROM [data#tab]



           

          to

           

           

          SELECT DISTINCT [data#tab].[Category] AS [Category],

            [data#tab].[Date 1] AS [Date 1],

            [data#tab].[Date 2] AS [Date 2],

            [data#tab].[ID Number] AS [ID Number]

          FROM [data#tab]



           

          Notice That I removed the other fields not pertinent to your question, and added the word "DISTINCT". This sets Tableau's viewpoint of the data to only unique rows for that combination of fields.

           

          Option 2 - Reference Line

           

          You can also display the correct value you are looking with a reference line.

           

          By placing ID Number on the level of Detail shelf and setting the calculated field as a dimension, instead of an average, you will get a mark for each ID Number. Adding a reference line that averages the values of these marks will give you the value you are looking for.

           

          Take a look at at the workbook at http://public.tableausoftware.com/views/distinctIDNumbers/Dashboard1 that shows these two options and the incorrect view. You can download the workbook from Tableau Public and take a look at the details.

          • 3. Re: Counting unique values
            Joe Mako

            I updated the workbook at http://public.tableausoftware.com/views/distinctIDNumbers/Dashboard1 to also demonstrate the use of a custom table calculation to achieve the same result.

            • 4. Re: Counting unique values
              Laura Macfarlane

              Thanks very much - the custom table calculation is really helpful.

               

              I have another question along a similar vein I am hoping you can help me with. My data looks something like this:

               

              ID    Sign

              12    Wound

              12    Abscess

              12    Pruritis

              13    Pneumonia

              13    Pruritis

              13    Pustules

              14    Pneumonia

               

              I've created a calculated field where I've grouped signs of the skin:

               

              ID    Sign            Skin Sign

              12    Wound                Yes

              12    Abscess                Yes

              12    Pruritis                Yes

              13    Pneumonia        No

              13    Pruritis                Yes

              13    Pustules                Yes

              14    Pneumonia        No

               

              Leaving me with something like this:

               

              ID    Skin Sign

              12    Yes

              13    No

              13    Yes

              14    No

               

              I want to graph IDs which contain at least one skin sign compared to those that do not (so excluding all the rows for a given ID if the ID has any rows with Skin Sign 'Yes' in them. At the moment I'm ending up with a table that looks like this because ID 13 is being counted twice:

               

              Sign sign    Number of entries

              Yes                      2

              No                      2

               

              I would like something like this, where ID 13 is not counted in the 'No' column because it has already been included in the 'Yes' column.

               

              Sign sign    Number of entries

              Yes                            2

              No                            1

               

              I hope you can help! Thanks in advance.

              • 5. Re: Counting unique values
                Joe Mako

                Thank you for sharing this situation, I learned quite a bit from attempting to solve your question.

                 

                How about something like that attached?

                 

                With the "ID" field on the Level of Detail, and combined with the aggregation pill "MAX(Skin Sign)" on the Rows, will tell the data source to return only one row per ID. If an ID has both a "Yes" and a "No" value, the MAX() aggregation will return "Yes" over "No". you can see this on the first tab attached.

                 

                The table calculation is performed on these results, and I use the "WINDOW_SUM()" function because we want an aggregation of an aggregation. To count 1 for each ID in each grouping of the "Skin Sign" values returned.

                 

                There are a few different possible methods to work in this situation, in this table calc, you could replace "COUNTD([ID])" with "AVG(1)" and get the same results.

                 

                 

                IF FIRST()==0 THEN
                
                 WINDOW_SUM(COUNTD([ID]),0,IIF(FIRST()==0,LAST(),0))
                END
                


                 

                The "IF FIRST()==0" is telling Tableau to only return a value for the first mark, so you get only one mark/bar drawn for each "Skin Sign" value. The ",0,IIF(FIRST()==0,LAST(),0)" is likely not needed, but I put it in to ensure efficiency in calculation of the WINDOW_ function, at this point I add it in out of habit.

                 

                 

                IF FIRST()==0 THEN
                
                 WINDOW_SUM(COUNTD([ID]))
                END
                


                 

                Would work just a well in this situation.

                 

                Additionally, I set the Compute using for the table calculation's pill to be "ID", and change the setting on the "MAX(Skin Sign)" pill to Not be ignored in table calcs. This is telling the table calc to use "MAX(Skin Sign)" for partitioning, and "ID" for addressing. I also made some formatting changes, setting the formatting so that special values (NULL) are hidden, and then turning off stacked marks so the label will default to end of the bar instead of the center. I felt this made the display of the bars nicer, but different setter would be applicable for a different mark type, or display purposes.

                 

                There are many other options to be aware of depending on how you want to use this calculation. Depending on what you want for a final result, if you wanted to add additional dimensions, some things may need to be changed.

                 

                Side note: I had attempted to use "TOTAL(COUNTD([ID]))" at first, and I learned that you cannot partitioning a TOTAL table calc on an aggregate. I found that interesting, and explainable since a TOTAL is a separate query to the data source, whereas a WINDOW_ function is performed on the results of the main query, so it has access to the aggregations performed.