1 2 Previous Next 18 Replies Latest reply on Oct 15, 2015 11:28 AM by Joe Oppelt

# Count Distinct is over counting

Hi,

The Count Distinct table calculation is returning strange results. For example, I have a list of customers who purchase a product over a given time range. I am trying to count the distinct number of customers who purchased this product during that time frame; however, depending on whether I filter by year, quarter, month, etc., the calculation returns different results.

Calculation:

Result when putting Year(BillingDate) field in Columns shelf = 1,403

Result when putting Quarter(BillingDate) field in Columns shelf = 1,684

Result with no pill on the Columns shelf = 928

What is going on here?

• ###### 1. Re: Count Distinct is over counting

Michael,

We'll need a little more information to figure this one out. Can you post a workbook? You are correct that you shouldn't be getting different results based on the logic. Are you using continuous or discrete time periods? Are you filtering anything out? I assume the numbers you are show are based on Total row calculation instead of them being related to an individual time period.

-Wesley

• ###### 2. Re: Count Distinct is over counting

Michael, a few notes:

• COUNTD() is not a table calculation, it's just an aggregation
• RUNNING_SUM() is the table calculation
• In Tableau Dimensions (in your case BillingDate) are what slices the measures
• Tableau is probably doing exactly what you are asking it to do

In all those values you listed after 'Calculation:' what were your expected result (when you changed pills)? Of course it is always best to post a packaged workbook with a bit of sample data for us to play with. Cheers.

• ###### 3. Re: Count Distinct is over counting

So you have 928 customers, but if you split time to years or quarters it will count customers for each period and then sum those values.

• ###### 4. Re: Count Distinct is over counting

Hello Michael,

Just wanted to provide two valuable threads, that I believe are related to your situation.

The first one has to do with a COUNTD along a running period (So similar to how you use WINDOW_SUM(SUM([Field]), -6, 0) where you are specifying the start and end partitions.

How to count distinct users on a running period

There is still no easy way to do this.

And here is a thread related to doing a RUNNING COUNTD, which is slightly different

Running Distinct Sum or Total using COUNTD

Going back to the previous thread, there is a way of doing a "Running COUTD" based on either the dataset as a whole OR the viz as a whole, utilizing LoDs.

For the whole dataset

Re: How to count distinct users on a running period

Which leverages FIXED to address the first interaction of a customer.

Or the alternative that uses the vizLoD

RUNNING_SUM(SUM(IF [DATE] = {INCLUDE [Customer] : MIN([DATE]) } THEN 1 END))

Hope this helps.

Regards,

Rody

• ###### 5. Re: Count Distinct is over counting

Hi Ville,

Yes, that is correct and I figured that's what it was doing. But, I don't understand why. Why does it count more customers when time periods are involved? The customer count shouldn't be different. For example, if I have 10 ***** and then put them in two different boxes, I should still be able to count 10 ***** total and not, for example 15 *****.

• ###### 6. Re: Count Distinct is over counting

Michael

This has to do with the table calc you are using. Table Calculations are computed based on the results of your regular aggregates in context of partition (the vizLoD).

So, as Ville mentioned, Tableau is first calculating the countd in each partiton (defined by your date field) then performing a running sum on those values.

What you are propsing would be a calc like Running_countd which curently doesn't exist. But we can work around that using LoD, as I described above.

Regards

Rody

• ###### 7. Re: Count Distinct is over counting

Thanks, Rody - I am not sure what you mean by LoD. Are you saying that I need to create a table calculation similar to the following and it will work: PS - My company is still using Tableau 8, so I can't do anything that requires Tableau 9. I'd love for IT to upgrade to version 9, but right now they won't.

RUNNING_SUM(SUM(IF [DATE] = {INCLUDE [Customer] : MIN([DATE]) } THEN 1 END))

• ###### 8. Re: Count Distinct is over counting

This is essentially what I wrote earlier.  If a user shows up in 3 querters and you are counting by quarter, he'll count 3 times, not 1.

• ###### 9. Re: Count Distinct is over counting

Thanks, Joe. Any ideas on how I can fix this? I am using Tableau 8.

• ###### 10. Re: Count Distinct is over counting

For the record, I SORT OF got around this -- just today -- in a project I'm doing.  I have to get a countD of accounts, and my data has a fiscal month and fiscal year field in it.  I can determine which fiscal months/years I need to address in the count, and in a copy of the data source I do the following:

Unique Account Calc:

if (convoluted logic to grab the right fiscal months/year) then [Account ID] end

What this does is sets the calc to the value of the account ID if the row is in the right time span.

Then I drag this calc from the copy source (secondary source) onto the viz.  It yells at me that I don't have a blend field.  I ignore that.  I change the pill to measure-> countd.

And I get the right value.  It just aggs up EVERYTHING since there is no blend relationship, and sinc the calc only gets set if the row is in the time period (driver by parameters, BTW), the countD is correct.  And I like it!  If the user changes from FYTD to last 12 months, it recalcs the secondary data, for example.

This may or may not work for you.

• ###### 11. Re: Count Distinct is over counting

And yes.  LOD is a 9.0 thing.  But LOD is just another way to do table calcs (essentially).

Maybe if you package up a workbook that demonstrates your issue, I can take a look and see if we can't get this to work for you.

• ###### 12. Re: Count Distinct is over counting

Apologies, LoD means Level of Detail, to be specific I was talking about an LoD calculation, which was introduced in V9.

Here is a method that might work for you on V8, that has a wonderful explanation of what is going I'm under the hood.

Re: Running Distinct Sum or Total using COUNTD

Regards,

Rody

• ###### 13. Re: Count Distinct is over counting

Thank you, Rody. This has been very helpful; however, I am still running into issues. I can't upload my workbook because of the sensitive data. However, I tried to put all relevant info below. I used the link you posted and copied all the formula's, etc. Unlike the workbook that was part of this Re: Running Distinct Sum or Total using COUNTD

my counting doesn't appear to take into account the window partitioning. For example, British Telecom shows up as 3 in the 2007 Q1 pane instead of 2. Also, the values under New WM WC Count are incorrect.

Calculations I created:

WC Count is below - Also see screen shot for how I sorted:

Running Distinct Count is below - Also see screen shot for how I sorted:

PREVIOUS_VALUE(0)

ELSE

PREVIOUS_VALUE(0) + 1

END

New WM WC Count is below - Also see screen shot for how I sorted:

IF FIRST()==0 THEN

WINDOW_MAX([Running Distinct Count])

END

Output:

• ###### 14. Re: Count Distinct is over counting

Michael Lewis wrote:

Thank you, Rody. This has been very helpful; however, I am still running into issues. I can't upload my workbook because of the sensitive data.

Create an anonymized example as shown in the video linked here:

Anonymize your Tableau Package Data for Sharing

1 2 Previous Next