1 2 Previous Next 16 Replies Latest reply on Aug 25, 2015 2:17 PM by sumesh.kumar Branched to a new discussion.

    Subscription Customers Retention Analysis Help

    jonathan.pang

      Hi,

       

      I am fairly new to Tableau and have been trying to create a cohort (by month/year) analysis for subscription-based customers. I am having trouble displaying the correct data however...

      The columns that I have imported into Tableau and am using for this are:

      • Cohort Date: Date which the customer began purchasing services
      • Cohort End Date: Date which customer terminated services. Is null if still an active customer
      • Account ID: customer ID number (was planning on using this countd() as the marks. Countd() because each row is a product, customers have multiple rows in the dataset)

       

      I followed the article on Cohort Analysis (Tips for Cohort Analysis | Tableau Software) and created several calculated fields:

       

      Number of Months Active

      datediff('month',[Cohort_Date],[Cohort_End_Date])
      
      

       

      Cohort Month

      DATETRUNC('month',[Cohort_Date])
      
      

       

       

      I have put "Num of Months Active" in Columns and "Cohort Month" in Rows. When I show countd(Account ID) in Marks it shows the number of accounts that cancelled that month under the respective column header and zeroes if there were none. I am trying to Show number of active accounts across minus cancelled accounts. I'd like to have another table that shows the percent change from month to month (current month number of active accounts minus prior month number of active accounts)

       

      Something that looks like this:

      CohortMonth 1Month 2Month 3Month 4
      Jan 2013100908479
      Feb 2013200195175
      Mar 2013250200
      Apr 2013325

       

      CohortMonth 1Month 2Month 3Month 4
      Jan 2013100%90%84%79%
      Feb 2013100%.......
      Mar 2013100%........
      Apr 2013100%.........

      (with corresponding % in ...)

       

      Any advice? After I get this formatted correctly, I will be able to further filter the data based on other columns such as product types, etc. correct?

       

      Thank you in advance!

        • 1. Re: Subscription Customers Retention Analysis Help
          Bruce Segal

          Jonathan:

           

          I'm going to take a stab in the dark here and approach this from a diff't direction b/c I don't fully understand the problem. If this approach won't work for you not a problem just say so.

           

          It sounds like you want a way to identify if an account is active in a given month. I've used calcs like the ones you list to create 3 difft calc'd dimensions:

           

          1. One gives the current status of an account as active or ended. From that I can display active and past accounts and can color code them or change the size.
          2. The second determines the status of the account in ANY month. I use this to show a client as Inactive in months before the current date.
          3. The last shows the number of months the client stayed as a customer.

           

          I use them in various combinations depending on the analysis I need to do. See the attached .png files for examples of how I use them. The first graph shows over time we're retaining more clients than we loose, and that we have a fair number of new clients in the pipeline so we need to make sure we work to retain them.

           

          The second graphs shows that most of the clients that left us did so in the first 3 to 4 months. So we either need to do a better job on customer service to meet their needs and expectations, or revisit the marketing lead gen and sales qualification process to make sure were adding qualified leads and prospects with a high potential to have a high customer lifetime. The number at the end of the row is the total months as a customer, and the color reflects the same.

           

          I could use these calcs in tables or graphs that should be able to replicate what you describe you want to do. Here are the formulae. I think you can convert the dimension names them to reflect your field names.

           

          1. Client Current Status

           

          IF ( IsNull([Client Start Date]) AND IsNull( [Client Termination Date]) ) THEN "9 Check Data"

          ElseIf  IsNull( [Client Termination Date])  THEN "1 Active"

          ElseIf Not IsNull ( [Client Termination Date])  THEN "2 Former"

          Else "9 Check Data"

          End

           

          2. Client Status by Month

           

          IF ( IsNull([Client Launch Date]) AND IsNull( [Client Termination Date]) ) THEN "Check Data"

          ElseIf  ([Date]-[Client Launch Date]) < 0 THEN "Pre Start - Check Data"

          ElseIf Not  IsNull([Client Launch Date]) AND IsNull ( [Client Termination Date]) THEN "Active"

          // The above business logic worked for our data set up. If we had a client launch date that wasn't null and the client termination date was null then that meant we had an active client. I don't know if this logic applies to your data structure.

           

          ElseIf Not IsNull ([Client Termination Date]) AND  ([Date]-[Client Termination Date]) < 0 THEN "Active"

          ElseIf Not IsNull ([Client Termination Date]) AND  ([Date]-[Client Termination Date]) >= 0 THEN "Former"

          Else "Check Data"

          End

           

          3. Number of Months Active. I call it client Maturity. And here for simplicity sake keep it to  1 Month Increments up to the 6th Month and then aggregate everything else as 7 months. You can extend the range as you like.

           

          IF ( IsNull([Pub Launch Date]) AND IsNull( [Pub Termination Date]) ) THEN "Check Data"

          ElseIf ([Date]-[Pub Termination Date]) >= 0 THEN "Former"

          ElseIf ([Date]-[Pub Launch Date]) < 0 THEN "Pre Launch"

          ElseIf ([Date]-[Pub Launch Date]) < 31 THEN "1"

          ElseIf ([Date]-[Pub Launch Date]) < 61 THEN "2"

          ElseIf ([Date]-[Pub Launch Date]) < 91 THEN "3"

          ElseIf ([Date]-[Pub Launch Date]) < 121 THEN "4"

          ElseIf ([Date]-[Pub Launch Date]) < 151 THEN "5"

          ElseIf ([Date]-[Pub Launch Date]) < 181 THEN "6"

          Else "7"

          End

           

          Attrition and retention over time.png

           

          Open 6a Pub Status Current and By Month.png

          • 2. Re: Subscription Customers Retention Analysis Help
            Jonathan Drummey

            You're going to need to pad out your data in some way so it can be counted, there are several different techniques covered in this thread:

             

            http://community.tableau.com/thread/121505

            • 3. Re: Subscription Customers Retention Analysis Help
              jonathan.pang

              Bruce Segal Thank you for the help! This looks very useful, I have begun to develop the charts you screenshot. A couple of questions about your calculated fields...

              • Are [Client Termination Date] & [Pub Termination Date] the same field?
              • Can you explain the [Date] field? Did you use Today()?

               

              What I am really trying to display is something that looks like this:

              Capture.PNG.png

              With the rows as [Cohort Month] (such as Dec 2008, Jan 2009, etc.) and [Cohort Month #] as the columns. The numbers displayed in the body would be number of active accounts during that month.  0 for [Cohort Month #] would be the first month of that customer cohort. Under [Cohort Month #] of 0, Cohort Dec 2008 would show # of active accounts that joined dec 2008 during dec 2008. Under [Cohort Month #] of 0, Cohort Jan 2009 would show # of active accounts during Jan 2009 that joined Jan 2009 . Under [Cohort Month #] of 1, Cohort Jan 2009 would show # of active accounts during Feb 2009 that joined Jan 2009 . Under [Cohort Month #] of 6, Cohort Jan 2009 would show # of active accounts during July 2009 that joined Jan 2009 .

               

              I was able to properly format to display # of accounts that cancelled during their respective [Cohort Month #] but am unable to figure out how to display # of accounts active (# of active accounts for prior month minus # of cancelled accounts that month]. I would also like to show retention % as # of active accounts divided by starting # of total accounts.

               

              Any advice on how to achieve this? Sorry if my explanation doesn't make sense, let me know if you need any clarification.

               

              Thanks again for all your help!

              • 4. Re: Subscription Customers Retention Analysis Help
                jonathan.pang

                Jonathan Drummey Thank you for sharing that post. Unfortunately I have zero experience with SQL and am completely lost reading those posts. Could you get me started in the right direction on how to approach this? Is there a way to do so without SQL? I am importing data from a large txt file, almost 2GB.

                • 5. Re: Subscription Customers Retention Analysis Help
                  Jonathan Drummey

                  The problem here is that you want to count each data point multiple times, so the data needs to be padded, either in the SQL or in Tableau via table calculations. The SQL might not be that complex, if you post a packaged workbook with some sample data I can take a lok at this over the weekend.

                   

                  One question about that data file, how many rows are in it?

                   

                  Jonathan

                  • 6. Re: Re: Subscription Customers Retention Analysis Help
                    jonathan.pang

                    Jonathan Drummey Thank you for the help Jonathan, I appreciate it!

                     

                    I attached a small sample of the data (330 rows). I eliminated some of the irrelevant columns and changed some of the sensitive data. Each row is at a product level. A single account will have multiple rows for current and past products. Because its a small sample, every cohort month isn't represented. The column Cohort_Date is the starting date of the account and Cohort_End_Date is the termination day of the account (blank if still active).

                     

                    The second tab in the workbook is the desired end result. I would like to be able to filter by columns: Area, Prdct_Group, etc. once I have the formatting complete.

                     

                    The trouble I had when trying to create this was having an account be counted under each month (month 1, month 2, etc.) it was active.

                     

                    The master data file I am using is 1,272,589 rows.

                     

                    Please let me know if there is any other info you need. Thanks again!

                    • 7. Re: Re: Subscription Customers Retention Analysis Help
                      Jonathan Drummey

                      Hi Jonathan,

                       

                      Are you still needing help with this?

                      • 8. Re: Re: Subscription Customers Retention Analysis Help
                        jonathan.pang

                        Hi Jonathan,

                         

                        Yes any help would be greatly appreciated. Thanks again and happy holidays!

                        • 9. Re: Re: Re: Subscription Customers Retention Analysis Help
                          Jonathan Drummey

                          There are multiple ways to go about this, I used a variation on the duplicated data source from Queues and Utilization in Tableau: Part 1 – Queues « Data Driven: Data Analytics, Dashboard Design with the stepped line approach from http://community.tableau.com/message/135833#135833.

                           

                          Here's what I did, it uses some Custom SQL to enable us to do the rest in Tableau, an alternative would be to fully pad the data in SQL but that would create a lot more rows.

                           

                          1. The raw data has many rows per account with the same cohort start date and end date for each account, and I'm guessing that Account = Customer. Rather than bring in all that raw data, I adjusted the Custom SQL code to pre-aggregate the data by Account #. This took 330 rows down to 28, then it was duplicated to create 56 rows. If we didn't do the pre-aggregation, there would be more work to do this in Tableau to only count Accounts. You can see the Custom SQL by right-clicking on the data source and using Edit Connection..., here's the code:

                           

                           

                          SELECT ['Data Sample$'].[Account #] AS [Account #],

                            MIN(['Data Sample$'].[Cohort_Date]) AS [Cohort_Date],

                            MIN(['Data Sample$'].[Cohort_End_Date]) AS [Cohort_End_Date],

                            1 AS [Source]

                          FROM ['Data Sample$']

                          GROUP BY ['Data Sample$'].[Account #]

                          UNION ALL

                          SELECT ['Data Sample$'].[Account #] AS [Account #],

                            MIN(['Data Sample$'].[Cohort_Date]) AS [Cohort_Date],

                            MIN(['Data Sample$'].[Cohort_End_Date]) AS [Cohort_End_Date],

                            2 AS [Source]

                          FROM ['Data Sample$']

                          GROUP BY ['Data Sample$'].[Account #]

                           

                          2. You can see that I added a "Source" dimension to the data to indicate which copy it came from, this will be used in a couple of calculated fields in Tableau. If you wanted to include other fields from the data for filtering or aggregating, then you could add those to the Custom SQL as well.


                          3. The raw data also had Nulls for the cohort end date, I presume that means that the account is still active. In Tableau, I created a Month for Padding field using the following formula:


                          CASE [Source]

                              WHEN 1 THEN [Cohort_Date]

                              WHEN 2 THEN IFNULL([Cohort_End_Date],TODAY())

                          END

                           

                          4. Then I could bring the Cohort_Date and the Month for Padding into aview with the Month/Year aggregation. Then I right-clicked on the Month for Padding and turned on show Missing Values, so the view looks like this (you can see it in the workout worksheet in the attached):

                           

                          2013-12-24 13_53_19-Tableau - triangular chart of subscriptions.png

                           

                          The data has now been padded out. The next steps are just getting to the right output, but they get a little complicated because we're working with dates that that doesn't exist outside of Tableau.

                           

                          If you put SUM(Number of Records) out, you'll just see two records for each account, one for the cohort start date and one for the adjusted cohort end date. That's because Tableau is only returning that where there is data in the underlying data source.

                           

                          5. To properly count each account, I created a Counter field with the following formula:

                           

                          CASE [Source]

                              WHEN 1 THEN 1

                              WHEN 2 THEN IIF(ISNULL([Cohort_End_Date]),0,-1)

                          END

                           

                          This counts up when there's a new account, and down for the cohort end date, and ignoring it if the cohort end date is Null.

                           

                          6. Then I used the Tableau Running Total Quick Table Calculation with a Compute Using of the Month for Padding, so it partitions (restarts) on each Cohort Date.. This gets us accurate results, but when the last month of any records for a cohort is before today, shows a zero instead of an empty value. So I created this calc as the Running Sum of Counter Feeder:

                           

                          RUNNING_SUM(SUM([Counter]))

                           

                          And then created the Running Sum of Counter:

                           

                          IIF([Running Sum of Counter Feeder]=0,Null,[Running Sum of Counter Feeder])

                           

                          This is what will get used in the display.

                           

                          7. Now for the Month #. For that I used the formula "M" + STR(INDEX()-1), again with a Compute Using of the Month for Padding.

                           

                          8. And to get an accurate Starting Total for the first month, I used a PREVIOUS_VALUE(SUM(IF [Source] == 1 THEN [Number of Records] END)), again with a Compute Using of the Month for Padding dimension. The inner IF returns only records for the given cohort date that are start records, those get summed up, and the PREVIOUS_VALUE() calc is an optimization to return that to every row (Month for Padding) in the partition (the cohort month). This deals with the situation where a customer may have started and stopped in the same month, I altered the data so one of the customers for June 2010 did that.

                           

                          This can all be seen in the workout view.

                           

                          9. Once the workout view is ready, then I duplicated that worksheet to create the triangle view. This was dragging pills around and moving the unneeded ones off the view:

                          2013-12-24 14_17_00-Tableau - triangular chart of subscriptions.png

                           

                          10. Tables with lots of numbers are harder to interpret, and Tableau is a visual program, so I created a couple of alternative views. Here's one where I moved the Running Sum of Counter to the Rows Shelf, and put a copy of the Color Shelf, so we can use the pre-attentive attributes of position & color to identify outliers:

                           

                          2013-12-24 14_20_41-Tableau - triangular chart of subscriptions.png

                           

                          11. If you'd rather have stepped lines than diagonal ones, that's also possible, using a modified Month # calc that returns a continuous number instead of a discrete string: Like all the other calcs, it's using the same Compute Using on the Month for Padding:

                           

                          2013-12-24 14_21_41-Tableau - triangular chart of subscriptions.png

                           

                          I hope this is helpful, and happy holidays!

                           

                          Jonathan

                          • 10. Re: Re: Re: Subscription Customers Retention Analysis Help
                            jonathan.pang

                            This is great! Thank you so much for your help. I am going to try and replicate this today. I'll let you know if I have any questions. Thanks again!

                            • 11. Re: Re: Re: Subscription Customers Retention Analysis Help
                              Bruce Segal

                              Jonathan Drummey thanks so much for posting this. It's great! I can use it to improve other cohort analysis I do.

                              • 12. Re: Subscription Customers Retention Analysis Help
                                Muazma Zahid

                                I know it is an old post, but I am trying to do something very similar. The only problem is in my data month 1,2,3 start and end with a custom date. For example march 2014 cohort is defined as everyone who signs up between Feb 15th to March 14th and month 1 is march 15th to April 14th, month 2 will be April 15th-March 14th and so on.

                                I am able to calculate the cohort x-axis based on a sign-up date but y-axis seems impossible because the datepart is only my calendar month.

                                 

                                End result should be something like this:

                                  

                                jan 14 feb 14mar 14apr 14mar 14
                                sales between 15th jan 2014- 14th feb 2014Month 165065065020001500
                                sales between 15th feb 2014- 14th mar 2014Month 27507507504,500
                                sales between 15th mar 2014 - 14th apr 2014Month 31,5501,5501,550
                                sales between 15th apr 2014- 14th may 2014Month 4900900
                                sales between 15th may 2014- 14th jun 2014 Month 51,312

                                 

                                So far I have the month 1,2,3 based on calendar month and not the custom month.

                                Thank you in advance!

                                • 13. Re: Subscription Customers Retention Analysis Help
                                  Jonathan Drummey

                                  I'm sorry, I'm not able to help you without at least some sample data, ideally a packaged workbook with that sample data and your work so far, and a mockup of your desired results.

                                   

                                  Jonathan

                                  • 14. Re: Subscription Customers Retention Analysis Help
                                    sumesh.kumar

                                    Jonathan, Appreciate this detailed triangle view explanation. I am having a small issue. The 2nd view where #Months are showing as columns I do not see all my months as I have missing values. I do not have this issue in the first view since I select Show Missing Values. But this option does not exist when I drop Padding on Marks.

                                     

                                    Any help is appreciated. Thanks.

                                    1 2 Previous Next