1 Reply Latest reply on May 1, 2012 11:05 AM by Jonathan Drummey

    Attrition and Overlapping Date Ranges

      Hi Everyone,

       

      I've been having some trouble with getting at an "attrition" or "churn" type of metric with Tableau connected to a large (~3 million records a day) table on Vertica. My table contains one record per user per day with various statistics about the user's activity that day.

       

      Ideally, I'd like to some kind of subquery that lets me to choose a date and output what I'd describe as, "Out of all the users active 7-13 days before that day, how many had any activity 0-6 days before that day." Unfortunately, Vertica doesn't seem to allow any kind of custom sql or rawsql calculated field that has MAX(date) in it so I don't know if this is possible.

       

      As a consolation, I've been trying to create a few duplicate data connections to output a bar graph showing the number of unique users active in different time periods (0-6 days before the selected date, 7-13 days before the selected date, etc). The problem I'm having with that I can't get this data to display in the same window or on the same axis.

       

      If anyone can help or even point me in the right direction, it would be greatly appreciated. Let me know if any of this isn't clear. I'm trying to figure out how to generate some fake data and package it as an example to make this a little clearer.

       

      Thanks

        • 1. Re: Attrition and Overlapping Date Ranges
          Jonathan Drummey

          Hi Bobby,

           

          I've attached a solution that makes use of nested table calculations and the Superstore Sales data set to count customers who have had sales in the current quarter and all those same customers who had sales in the next quarter. From this, you can hopefully create something that will work with your data, I've got some notes at the bottom.

           

          The view uses one parameter to select the date.

           

          - Selected Quarter - this just lets users pick the quarter in the Bar Chart view.

           

          The view has the following calculations:

           

          - Quarter of Order Date: DATETRUNC('quarter',[Order Date]).

           

          I used quarter to get me enough data to be interesting. In general, when working with table calcs and dates it's helpful to use your own calculated date so you don't have to keep on choosing the right level when setting the table calcs. This field and the Customer field (user in your case) must be in the level of detail for the view for it to work.

           

          - # of Active in Current Quarter:

          IF FIRST()==0 THEN

              WINDOW_COUNT(COUNTD([Customer]), 0, IIF(FIRST()==0,LAST(),0))

          END

           

          This would return lots of duplicate rows, so the IF and IIF statements use a technique developed by Richard Leeke to return only one row per quarter. When this calc is put on the view, set the Compute Using to Customer.

           

          - Active in Next Quarter

          IF LOOKUP(ATTR([Quarter of Order Date]),1) = DATEADD('quarter',1,ATTR([Quarter of Order Date])) THEN

              1

          ELSE

              0

          END

           

          In the Superstore sales data, if a customer has a row in the data then it must be a sale, so what I did here was to use Tableau's LOOKUP() function to find out if there are any rows for the next quarter, and if so return 1.

           

          Once in the view, the table calculation needs to have the following configuration: Compute using Advanced... with Customer and Quarter of Order Date, order along Quarter of Order Date/Min/Ascending, at the level Deepest, restarting every Customer. What all that does is tell the calculation to calculate along every combination of Customer and Quarter of Order Date (with the correct sort).

           

          - # of Active in Next Quarter

          IF FIRST()==0 THEN

              WINDOW_SUM([Active in Prior Quarter],0, IIF(FIRST()==0,LAST(),0))

          END

           

          Here we're summing up the values of Active in Next Quarter. Once this calculation is in the view, go to the Edit Table Calculation... context menu and set the partitioning to Customer. This lets the nested partitioning work for Active in Prior Quarter. If you just use the shorthand Compute Using, you will break the partitioning for Active in Prior Quarter.

           

          - Selected # of Active for Next Quarter

          IF ATTR([Quarter of Order Date]) = [Selected Quarter] THEN

              [# of Active in Next Quarter]

          END

           

          This uses the Selected Quarter Parameter to return the # of Active in Next Quarter. When bringing this into a view, if # of Active in Next Quarter is already in the view then you won't need to set the compute using for this calc, otherwise you'd have to go to the Edit Table Calculation... context menu to set the compute using for both # of Active in Prior Quarter and Active in Prior Quarter.

           

          - Selected # of Active for Quarter

          IF ATTR([Quarter of Order Date]) = [Selected Quarter] THEN

              [# of Active in Quarter]

          END

           

          This returns the # of Active for the selected quarter, to use in the bar chart.

           

          I created two charts, a bar chart that shows the data for the selected date and a line chart that shows the results across dates.

           

          Here's what I can come up with for ways you might/will need to change this.

           

          - I set this up so that the Selected Quarter pulls results from the chosen date and chosen date + 1. For your view, you could change the last two calculations to return the values for days further back.

          - Make a date calculation that you can use for partitioning the view, it will make life a lot easier.

          - You probably will want to use a filters filters to reduce the data that Tableau is bringing into the view, for example creating a filter that only returns records between the chosen date and chosen date + 1.

           

          I'm not familiar with Vertica and haven't worked with big datasets like yours, so I don't know what other performance improvements might be needed.

           

          Let me know if this works for you,

           

          Jonathan