2 Replies Latest reply on Nov 1, 2018 3:39 AM by Ciara Brennan

    Deciphering Tableau Level of Detail churn calculations

    David Taylor

      Tableau does an awesome job of computing churn! I learned some techniques recently by copying and pasting some other folks’ wonderful work, specifically that of Yuriy Fal, at Cohort Analysis : Reactivation, Retention, Churn, but I’m personally never truly content just copying TWBX’s until I understand the details fully. And I expect I'm not alone, so I’m authoring this to explain it all. I’m going to focus on year-over-year churn, just to keep things simple. Even with this focus, though, you should know that this is a pretty advanced technique. Be proud once you understand it!

      I’ll frame this essay as a step-by-step “how-to” guide. So first, fire up Desktop. The version doesn’t matter, as long as it supports LoD calculations which came out a very long time ago.

        

      Let’s start by understanding churn. In all honesty, this is possibly the most important aspect of this exercise, so I’ll break it down in detail: start a new Superstore workbook, put Customer Name in the Rows, Order Date in the Columns, and Number of Records on the Text shelf.

      This crosstab shows which customers purchased anything in each of the data’s four years. Aaron Bergman purchased in 2015 and 2017, for example. We’ll define “Churn Status,” then, for each customer, for each year of purchase. We’ll say that Aaron’s “Churn Status” in 2015 was “churned” because he didn’t purchase in 2016 and he “churned” again in 2017 because he didn’t purchase in 2018. Aaron Hawkins, on the other hand, “retained” in 2015, 2016 and 2017, because in each subsequent year he also purchased something. In 2018, the most recent year in our data set, nobody can be said to have “churned” nor “retained”, because 2019 hasn’t happened yet, so those statuses are all “Last Period”. This crosstab below shows the statuses I’m defining:

      There are a number of intersections here with no data, like Aaron Bergman in 2016 or 2018. For years where the customer purchased nothing, our definition of churn defines nothing. That’s convenient, because it would be very difficult for us to define any sort of status for Aaron in 2016, because there’s no data for that customer for that year.

      OK, now let’s study some calculations that break this down. Let’s start with the easiest one. Create this calculation:

       

      “Most recent order date”: {fixed: max([Order Date])}

       

      This LoD (Level of Detail calculation) always gives us the most recent order date in the entire data set, regardless of what customers, products, or even dates we have filtered out of our view. (Context, data source and extract filters notwithstanding.) To understand exactly what that looks like, drag that calculation to the tooltip of your first crosstab. You can see that the most recent date is 12-30-2018 for every single mark:

      Now create two calculations critical to our analysis, but seemingly unhelpful at first glance: create “Year frame even” as such:

      // This year or the prior year, whichever is EVEN

      IF

      YEAR([Order Date])%2=0

      THEN

      year([Order Date])

      ELSE

      year([Order Date])-1

      END

      This, calculation, seems simple enough: for every record, this will be the year of the order date, if that year is even. Otherwise, it’ll be the prior year (which will be even). Its sister calculation does exactly the opposite: records the order year, or the prior year, whichever is odd. Here are Aaron Smayling’s seven order records:

      Here’s the Odd calculation. Create “Year frame odd”:

      // This year or the prior year, whichever is ODD

      IF

      YEAR([Order Date])%2=1

      THEN

      year([Order Date])

      ELSE

      year([Order Date])-1

      END

       

      The key here is that we now have two columns that will group a customer’s orders for any year with that same customer’s orders for the subsequent year. We will use each in turn. Let’s study Aaron Smayling, in 2017: what we want to do is identify first that he placed two orders in that year (highlighted):

      Now we need to see if he purchased anything in 2018. Because 2017 is ODD, we will look at the “year frame – odd” column, and see that it’s 2017 for that year, and expand our search to all Aaron’s rows with that same value of 2017. In this case, there are six matching rows.

        

      Now we take a quick scan through those order dates, and count the distinct years therein. 2017 appears twice, and 2018 appears four times. Therefore we can conclude that Aaron was retained in 2017. This table illustrates the logic we need to replicate:

      What’s the order year?

      Is it odd or even?

      What’s the year in the relevant “Frame” column?

      How many rows share that “frame” year?

      In those rows, how many order years appear?

      1? That means that Aaron bought in only one year, so CHURN. 2? That means Aaron bought in both years, so RETAIN.

      2015

      Odd

      2015

      1

      1 (2015)

      CHURN

      2017

      Odd

      2017

      6

      2 (2017, 2018)

      RETAIN

      2018

      Not relevant, because 2018 is the data source’s most current year.

       

      Let’s do it again for Anna:

       

      What’s the order year?

      Is it odd or even?

      What’s the year in the relevant “Frame” column?

      How many rows share that “frame” year?

      In those rows, how many order years appear?

      1? That means that Anna bought in only one year, so CHURN. 2? That means Anna bought in both years, so RETAIN.

      2015

      Odd

      2015

      6

      2 (2015, 2016)

      RETAIN

      2016

      Even

      2016

      11

      2 (2016, 2017)

      RETAIN

      2017

      Odd

      2017

      6

      1 (2017)

      CHURN

       

      Articulating this logic requires a Level of Detail calculation. If those are new to you, this is perhaps a great way to begin to understand them. The syntax is as follows.

       

      Create “Churn Status”:

      IF year([Order Date])=year([Most recent order date]) then 0

       

      ELSEIF year([Order Date])%2 = 1 THEN

      {fixed [Customer Name],[Year frame - odd]: countd(year([Order Date]))}

       

      ELSE

      {fixed [Customer Name],[Year frame - even]: countd(year([Order Date]))}

       

      END

       

      Let’s break this down line-by-line:

       

      IF year([Order Date])=year([Most recent order date]) then 0

      In this first “IF” clause, we identify if that particular row is an order from the current most year. If so, then we’ll assign a ‘0’, which will indicate that there’s no churn status for that particular row, because we have no data for 2019 yet.

       

      ELSEIF year([Order Date])%2 = 1 THEN

      In the second clause, ELSEIF, we discern if the order year is even or odd. We’ll use the “%” operator, which is modulo arithmetic. It tells you the remainder after you divide a number by another number. In our case, the number is the order year (2017, for example), and we divide by 2 by using the “%2”. After dividing by 2, we get a 1008, with a remainder of 1. (1008 + 1008 + 1 = 2017). That 1 tells us the order year is odd.

       

      {fixed [Customer Name],[Year frame - odd]: countd(year([Order Date]))}

      This line is the trickiest to understand because it’s a Level of Detail calculation. This calculation focuses on all the rows in the data source (“fixed”) that share the same customer name and “Year frame – odd” (“[Customer Name],[Year frame - odd]”) as this row. For all those rows; it counts the distinct (“countd”) order years (“year([Order Date])”). It’s a tremendously powerful expression, and it tells us whether this customer also bought anything in the year subsequent to the year in this row.

      The rest of the calculation repeats the above logic for even years, using the “even” frame instead of the “odd” frame.

      Drag your new “Churn status” column to be a dimension, not a measure. Then drag it to the Text shelf, replacing SUM(Number of Records). This shows pretty clearly who churned (1), who retained (2), and where it’s not determinable (0).

      Finally, create some aliases for “Churn status” as such, which will make your crosstab a bit more readable.

       

      You’ve done it! From here, there’s so much more you can do: calculate percentages of customers who churned versus did not churn, visualize it however you like, etc. Those LoD expressions should be very durable compared to using Table calculations to accomplish this analysis. Hopefully you learned a few things along the way and can apply this logic to your own data readily. Happy vizzing!