1 2 Previous Next 20 Replies Latest reply on May 17, 2016 11:31 AM by Simon Runc Branched to a new discussion.

# Cohort Calculation help

Hi all,

I need some help creating a calculated field. I have customer ID, first ship month, ship month, and cohort as my fields.

Cohort is nothing more than a 1-12 value based on the difference between first ship month - ship month

example:

 Customer ID First Ship Month Ship Month Cohort 23141 9/1/2015 9/1/2015 0 23141 9/1/2015 11/1/2015 2 23141 9/1/2015 12/1/2015 3 23141 9/1/2015 1/1/2016 4 23141 9/1/2015 2/1/2016 5 23141 9/1/2015 3/1/2016 6 23141 9/1/2015 4/1/2016 7

The chart I am trying to create will show 0-12 across the x axis. Y axis will show numerator/ denominator (%)

numerator: add up # of distinct customers in each cohort

denominator: count # of distinct customers where first ship month - today >= 'cohort #'

I created a calculated field called denominator which doesn't give me what i want. However, I have individual calculated fields (0,1,2, etc) which gives me the value I want.

How do i create one calculated field to give me the denominators across my chart?

• ###### 1. Re: Cohort Calculation help

Anybody? I'm stuck... Pooja Gandhi?

• ###### 2. Re: Cohort Calculation help

hi Dipesh,

So I've had a look, and despite the apparent simplicty of the problem, the solution (well I hope the solution) is a little bit complicated, and employs a couple of fairly advanced techniques...Having said that, this was a fun little puzzle!

So I'm able to do all this using RUNNING_SUM, which takes the place of your >=1, >=2...etc...

So first up I create a calc to determine the number of months from today

[Months from 1st Ship to Today]

DATEDIFF('month',[First Ship Month],TODAY())

as an aside...the fact you've 'hard-coded' the [First Ship Month] is great Tableau practice...in PoC mode I'll use the LoD {FIXED [Customer]: MIN([Ship Date])}...but due to the performance cost of running FIXED LoD's (although that has massively improved in 9.3), I will then  hard code this into Database view, as you've done

So this gives me a dimension to use, and I get something like this

with the measure just [customer count]...so in order to know how many were >= to each one I can use RUNNING_SUM, on the [Customer Count]...like so

So closer, but the Running_Sum is going in the wrong direction!...so we can fix that in the advanced table calc set up, but using the sort, on the [Months from 1st Ship to Today]

So now the Running_Sum is performed in that direction, and we get the result you want!

...as an extra bit of security, in the attached I actually created a Bin for the [Months from 1st Ship to Today], with a Bin Size of 1. The reason is that as a Bin, I can now use the 'Show Missing Values' (so 'pad' the data out, in case there is no customers with a particular  [Months from 1st Ship to Today] (say if no-one shopped 5 months ago) this would still create a 5 in the column, and the Running_Sum will continue over that value...might be overkill!!

Hope that solves the problem, and makes sense

• ###### 3. Re: Cohort Calculation help

Um, looking at the complexity of this, I am so glad you answered this one, Simon!

As always, you always get things working!

Dipesh - hope this answers your question, sorry for not getting to you earlier!

Pooja.

• ###### 4. Re: Cohort Calculation help

thanks Pooja...To be honest it was a bit of a happy accident!. The screen shots actually show the 'accident', I tried Running_Sum (just because of the >=) and then just noticed (2nd screen) that the numbers I was aiming for we close, but in the wrong order!!...also the 'manual' >=1, >=2 calculations Dipesh included...so I had some numbers to aim for, were a big help!

• ###### 5. Re: Cohort Calculation help

I saw that you viewed the thread, and of course I knew you will get back!

I figured it required a bit of thinking, and today being Friday - my brain doesn't want to work!

• ###### 6. Re: Cohort Calculation help

Simon / Pooja -

Thank you both for your contributions.

The method you described works to derive the denominator values which I manually calculated in the example, however, any ideas on how to get percentages? Values on the numerator tab / values we just derived.

I'm looking to present a chart showing % by cohort #.

 cohort numerator denom % 0 76 76 100% 1 44 74 59% 2 41 66 62% 3 34 60 57% 4 29 51 57% 5 18 46 39% 6 16 36 44% 7 15 30 50% 8 12 28 43% 9 10 25 40% 10 10 22 45% 11 10 21 48%
• ###### 7. Re: Cohort Calculation help

Have you seen this awesome LOD example: Workbook: Cohort Analysis

• ###### 8. Re: Cohort Calculation help

hi Dipesh,

Ah!...so this does confuse things a bit. Traditionally a customer is in a single Cohort, but your Cohort calculation is actually just the Months between each purchase and the first purchase, so customers appear in multiple cohorts...

If I stack your bar-chart by [customer ID], and select a [Customer ID] I can see they are counted multiple times, in multiple cohorts

So this gives the issue, where for the denominator they have a single classification, but have multiple ones for the numerator, and everything is then grouped on the numerator (for the %age calculation)...The below shows this for a couple of IDs

I'm out with clients for the next couple of days, but I'll have a think about how best to calculate the 2 together

• ###### 9. Re: Cohort Calculation help

Thanks Simon. Looking forward to see if you can help solve this one!

• ###### 10. Re: Cohort Calculation help

hi Dipesh,

So I've had a bit of a play with this one, and it is a very tricky little problem...

The problem really lies in the fact that we are trying to create (and show) values based on 2 different dimensions. This made more tricky by the fact that a customer belongs to a single dimension (for all rows), our [Months from 1st Ship to Today] but for [Cohort] a customer has multiple different values...as the below shows for 2 customers and there is not necessarily a relationship between the 2 (if you look at 23142 they are counted in Cohort group 0, 1, 2 and then Group 9 for [Months from 1st Ship to Today]

btw rather than use today(), I've introduced a parameter...this is just because since starting this question we have moved months!...so want the results to be consistent with what you've posted

This is a classic Excel question, where in Excel you could calculate the 2, and then VLOOKUP one against the other (even though they are different dimensions) to get your calc. As such I have ended up with a solution using a 'hacky' data-blend! I have tried a few ways to get it to work from a single source, but can't find a way (and this might well be my limitations!) to get 23142 to get counted in [Months from 1st Ship to Today] <=9, and Cohort groups 0,1,&2 at the same time (with a single dimension column)...I'll keep the gray cells at it (but they generally let me down!!).

So the hacky way, is to create a duplicate of the data-source, and then Blend the [Months from 1st Ship to Today] with the [Cohort]...which acts (in this case) a bit like a VLOOKUP would (well actually more like a COUNTIFS).

So I duplicate the datasource, go to 'Edit Relationships' (in the Data menu) remove all the 'default' blending fields and set it up to blend [Months from 1st Ship to Today] with the [Cohort]

When I then bring in the COUNTD of customerID, it does what we want, and we can also create the %age figure

[%age]

COUNTD([Sheet1 (persistency sample) Duplicate].[Customer ID])/[Denominator Count - RunningSum]

As I said apologies this is such a horrible solution, but will let you know if anything better emerges!

2 of 2 people found this helpful
• ###### 11. Re: Cohort Calculation help

Hi Simon,

Apologies for the delay in response, I was out last week.

Thank you for following up. This is a great 'hack' and seems to work!

However...(one small issue). In the real data, I have a set of global filters which throw off the percentages when changed. By first glance, my observation is that the filters are applying to the Denominator only and not the numerator (Customer count) since it's pulled from the duplicate data source. Is there a way for me to tie the filters to both data sources?

Thanks again for all the help!

• ###### 12. Re: Cohort Calculation help

hi Dipesh,

So as of Tableau 10 (currently in Beta) we will be able to use a 'quick filter' across data sources (it's just a new option in the filter set up 'apply to all datasources'). Until this is released the main way to achieve this is through a parameter

There are issues with this method. 1) You can't select multiple values 2) Parameters won't update automatically (so if you populate a parameter with, say, customerID if new customers appear in the data, after a refresh, you'll need to go back in and re-populate the parameter.

So if this will work for you, will depend on what/how you are filtering.

• ###### 13. Re: Cohort Calculation help

Hi Simon,

Thanks again for the response.

I can live with the parameter limitations, however, there still seems to be an issue.

If I use the dimension as a parameter 'filter' from the primary data source, it only updates the denominator since we pull that field from the primary source. If I use the parameter from the secondary data source, I get the following error: "Cannot blend the secondary data source because one or more fields use an unsupported aggregation". I think it may be because we are using the CountD function to calculate the customer count from the secondary data source.

• ###### 14. Re: Cohort Calculation help

hi Dipesh,

So yes using the parameter filter here is a little more complicated...usually you are blending on the same field, say, state (as per the KB example), so when you filter one datasource this restricts the list of states, and so it only blends (from the secondary) on the states in that list. However we are doing something a little 'hacky' with the blend, so have to do it a slightly different way.

In the attached I've set up a 'date from' filter (so only include data after a certain [Ship Month]), where the 'date from' is a parameter. btw this isn't a great example as we don't get 100% for the first row if we filter down (this is because the 2 groups, the 0,1,2,3.. in both data sources aren't really likes to each date in a consistent way)...but this should give you the method.

So first I set up my parameter

I then add the filter into our COUNTD formula [Customer count]

COUNTD(IIF([Ship Month]>=[Parameter Filter: Date From],[Customer ID],NULL))

and do the same in the 'duplicate' data-source.

I then use this field in all counts and %age calculations.

Hope this makes sense, but please let me know if not.

for some reason it's not letting me attach the workbook! so I've published to public

1 2 Previous Next