1 2 Previous Next 18 Replies Latest reply on Oct 15, 2015 11:28 AM by Joe Oppelt

    Count Distinct is over counting

    Michael Lewis

      Hi,

       

      The Count Distinct table calculation is returning strange results. For example, I have a list of customers who purchase a product over a given time range. I am trying to count the distinct number of customers who purchased this product during that time frame; however, depending on whether I filter by year, quarter, month, etc., the calculation returns different results.

       

      Calculation:

      RUNNING_SUM(COUNTD([EndUserName]))

      Result when putting Year(BillingDate) field in Columns shelf = 1,403

      Result when putting Quarter(BillingDate) field in Columns shelf = 1,684

      Result with no pill on the Columns shelf = 928


      What is going on here?

        • 1. Re: Count Distinct is over counting
          Wesley Magee

          Michael,

          We'll need a little more information to figure this one out. Can you post a workbook? You are correct that you shouldn't be getting different results based on the logic. Are you using continuous or discrete time periods? Are you filtering anything out? I assume the numbers you are show are based on Total row calculation instead of them being related to an individual time period.

           

          -Wesley

          • 2. Re: Count Distinct is over counting
            Shawn Wallwork

            Michael, a few notes:

            • COUNTD() is not a table calculation, it's just an aggregation
            • RUNNING_SUM() is the table calculation
            • In Tableau Dimensions (in your case BillingDate) are what slices the measures
            • Tableau is probably doing exactly what you are asking it to do

             

            In all those values you listed after 'Calculation:' what were your expected result (when you changed pills)? Of course it is always best to post a packaged workbook with a bit of sample data for us to play with. Cheers.

            • 3. Re: Count Distinct is over counting
              Ville Tyrväinen

              So you have 928 customers, but if you split time to years or quarters it will count customers for each period and then sum those values.

              • 4. Re: Count Distinct is over counting
                Rody Zakovich

                Hello Michael,

                 

                Just wanted to provide two valuable threads, that I believe are related to your situation.

                 

                The first one has to do with a COUNTD along a running period (So similar to how you use WINDOW_SUM(SUM([Field]), -6, 0) where you are specifying the start and end partitions.

                 

                How to count distinct users on a running period

                 

                There is still no easy way to do this.

                 

                And here is a thread related to doing a RUNNING COUNTD, which is slightly different

                 

                Running Distinct Sum or Total using COUNTD

                 

                Going back to the previous thread, there is a way of doing a "Running COUTD" based on either the dataset as a whole OR the viz as a whole, utilizing LoDs.

                 

                For the whole dataset

                 

                Re: How to count distinct users on a running period

                 

                Which leverages FIXED to address the first interaction of a customer.

                 

                Or the alternative that uses the vizLoD

                 

                RUNNING_SUM(SUM(IF [DATE] = {INCLUDE [Customer] : MIN([DATE]) } THEN 1 END))

                 

                Hope this helps.

                 

                Regards,

                Rody

                • 5. Re: Count Distinct is over counting
                  Michael Lewis

                  Hi Ville,

                   

                  Yes, that is correct and I figured that's what it was doing. But, I don't understand why. Why does it count more customers when time periods are involved? The customer count shouldn't be different. For example, if I have 10 ***** and then put them in two different boxes, I should still be able to count 10 ***** total and not, for example 15 *****.

                  • 6. Re: Count Distinct is over counting
                    Rody Zakovich

                    Michael

                     

                    This has to do with the table calc you are using. Table Calculations are computed based on the results of your regular aggregates in context of partition (the vizLoD).

                     

                    So, as Ville mentioned, Tableau is first calculating the countd in each partiton (defined by your date field) then performing a running sum on those values.

                     

                    What you are propsing would be a calc like Running_countd which curently doesn't exist. But we can work around that using LoD, as I described above.

                     

                    Regards

                    Rody

                    • 7. Re: Count Distinct is over counting
                      Michael Lewis

                      Thanks, Rody - I am not sure what you mean by LoD. Are you saying that I need to create a table calculation similar to the following and it will work: PS - My company is still using Tableau 8, so I can't do anything that requires Tableau 9. I'd love for IT to upgrade to version 9, but right now they won't.

                       

                      RUNNING_SUM(SUM(IF [DATE] = {INCLUDE [Customer] : MIN([DATE]) } THEN 1 END))

                      • 8. Re: Count Distinct is over counting
                        Joe Oppelt

                        This is essentially what I wrote earlier.  If a user shows up in 3 querters and you are counting by quarter, he'll count 3 times, not 1.

                        • 9. Re: Count Distinct is over counting
                          Michael Lewis

                          Thanks, Joe. Any ideas on how I can fix this? I am using Tableau 8.

                          • 10. Re: Count Distinct is over counting
                            Joe Oppelt

                            For the record, I SORT OF got around this -- just today -- in a project I'm doing.  I have to get a countD of accounts, and my data has a fiscal month and fiscal year field in it.  I can determine which fiscal months/years I need to address in the count, and in a copy of the data source I do the following:

                             

                            Unique Account Calc:

                             

                            if (convoluted logic to grab the right fiscal months/year) then [Account ID] end

                             

                            What this does is sets the calc to the value of the account ID if the row is in the right time span.

                             

                            Then I drag this calc from the copy source (secondary source) onto the viz.  It yells at me that I don't have a blend field.  I ignore that.  I change the pill to measure-> countd.

                             

                            And I get the right value.  It just aggs up EVERYTHING since there is no blend relationship, and sinc the calc only gets set if the row is in the time period (driver by parameters, BTW), the countD is correct.  And I like it!  If the user changes from FYTD to last 12 months, it recalcs the secondary data, for example.

                             

                            This may or may not work for you.

                            • 11. Re: Count Distinct is over counting
                              Joe Oppelt

                              And yes.  LOD is a 9.0 thing.  But LOD is just another way to do table calcs (essentially).

                               

                              Maybe if you package up a workbook that demonstrates your issue, I can take a look and see if we can't get this to work for you.

                              • 12. Re: Count Distinct is over counting
                                Rody Zakovich

                                Apologies, LoD means Level of Detail, to be specific I was talking about an LoD calculation, which was introduced in V9.

                                 

                                Here is a method that might work for you on V8, that has a wonderful explanation of what is going I'm under the hood.

                                 

                                Re: Running Distinct Sum or Total using COUNTD

                                 

                                Regards,

                                Rody

                                • 13. Re: Count Distinct is over counting
                                  Michael Lewis

                                  Thank you, Rody. This has been very helpful; however, I am still running into issues. I can't upload my workbook because of the sensitive data. However, I tried to put all relevant info below. I used the link you posted and copied all the formula's, etc. Unlike the workbook that was part of this Re: Running Distinct Sum or Total using COUNTD

                                  my counting doesn't appear to take into account the window partitioning. For example, British Telecom shows up as 3 in the 2007 Q1 pane instead of 2. Also, the values under New WM WC Count are incorrect.

                                   

                                  Calculations I created:

                                  WC Count is below - Also see screen shot for how I sorted:

                                  WINDOW_COUNT(COUNT([EndUserName]))

                                  WC Count - Sorting Image.PNG

                                   

                                  Running Distinct Count is below - Also see screen shot for how I sorted:

                                  IF (ATTR([EndUserName]) == LOOKUP(ATTR([EndUserName]), -1)) THEN

                                    PREVIOUS_VALUE(0)

                                  ELSE

                                    PREVIOUS_VALUE(0) + 1

                                  END

                                  Running Distinct Count - Sorting Image.PNG

                                  New WM WC Count is below - Also see screen shot for how I sorted:

                                  IF FIRST()==0 THEN

                                    WINDOW_MAX([Running Distinct Count])

                                  END

                                  New WM WC Count.PNG

                                  Output:

                                  Output.PNG

                                  • 14. Re: Count Distinct is over counting
                                    Joe Oppelt

                                    Michael Lewis wrote:

                                     

                                    Thank you, Rody. This has been very helpful; however, I am still running into issues. I can't upload my workbook because of the sensitive data.

                                     

                                    Create an anonymized example as shown in the video linked here:

                                     

                                    Anonymize your Tableau Package Data for Sharing

                                    1 2 Previous Next