13 Replies Latest reply on Jul 20, 2015 7:18 AM by Keith Conner

    Cohort Analysis - How to sum and average only valid values in table?

    Patrick Negri

      Hey guys.


      Take a look at this table:

      Captura de tela 2014-07-09 14.20.53.png


      How to make the total sum only the valid values? Like in the model from Andrew Chen?





      Thanks guys!

        • 1. Re: Cohort Analysis - How to sum and average only valid values in table?
          Keshia Rose

          Hi Patrick,


          Could you attach a sample packaged workbook? That way it will be easier for people to be able to help.


          Take care,


          • 2. Re: Cohort Analysis - How to sum and average only valid values in table?
            Patrick Negri

            I am trying to generate sample data based on our data so we can work on a packaged workbook example.

            • 6. Re: Cohort Analysis - How to sum and average only valid values in table?
              Jim Wahl

              Hi Patrick,


              Tableau calculates sub totals and grand total by removing dimension "pills" to the right of the dimension you're sub totaling. For grand total, all dimension pills on the rows or column shelf are removed.


              In your worksheet, for example, column grand total is calculated by removing the discrete, blue MONTH(Date of Signup) pill from the rows shelf.


              This of course causes problems for your cohort calculation, which has a month-based denominator.


              The solution, which is not at all obvious, is to add back the detail by putting a copy of the Date of Signup on the Detail shelf / button. This doesn't affect your normal calculations at all, since you haven't increased the level of detail for all of the regular calculations, but when Tableau calculates the grand total and removes the Date of Signup pill on the rows shelf, the Date of Signup (Copy) on the Detail shelf keeps the months in the calc.


              Here are the basic steps.


              1. Duplicate the dimension Date of Signup. Just right-click the dimension in the left data pane > Duplicate.


              2. Drag this new Date of Signup (Copy) to the Detail button. Set it to Month, same as the pill on the Rows shelf.


              Again, this won't affect the regular rows, but you can now see the grand total row has multiple values, one for each month. (I've hidden April-to-Dec below.)



              Since MONTH(Date of Signup) is on the Detail button and not on the Rows/Columns shelf, Tableau "stacks" the extra marks, which is why you see multiple values in the GT row. And why you see extra space on the other rows.


              I'm going to walk through this step-by-step, although you could probably combine a couple of these.



              3. Create calculated field for Start Val

              One good way to learn the table calc formulas is to dissect the Tableau-generated quick table calcs. For example, you can see how Tableau created % of Total in your worksheet by clicking the pill > Edit Table Calculation > Customize where you'll see:


              COUNTD([Client ID]) / TOTAL(COUNTD([Client ID]))


              I copied this last part TOTAL(COUNTD(...)) to a new calculated field called Start Val. This is a table calculation, because of TOTAL(), which allows you to specify a different level of aggregation for COUNTD(...).


              Double-click this measure to add it to the worksheet. By default Tableau aggregates table calculations by "Table Across", which in this case is retention period. While that is what we want, I'll often make the partitioning/addressing explicit by clicking the pill > Compute Using > Retention. This helps me keep things straight and also allows me to move the pills around without affecting the addressing.


              In the GT row, you can see we're getting closer. The total row gives you the numerator and the Start Val row has the denominator.




              4. Sum values on GT row

              The final part is to sum all of the values. That's done with another table calc function WINDOW_SUM(). Click the Start Val measure in the left Data pane > Create Calculated Field. And wrap Start Val in a WINDOW_SUM().


              Start Val Total = WINDOW_SUM([Start Val])


              Double click the measure to add it to the worksheet. Again, Tableau aggregates this as Table Across, which adds all the start vals over the retention period for each month----so you get your start val multiplied by the number of retention periods.


              This is of course not what we want, so click the pill > Edit Table Calculation.




              Nested Table Calc.  In the Table Calc dialog box, you'll see a pull-down menu for Calculated Field. Because this calculated field refers to another field, Start Val, which has table calc functions, Tableau allows you to set the partitioning / addressing (aka compute using) independently for each calculated field with a table calc. This is also called a "nested table calc."  Here we do want to set them differently. Select Start Val and set Compute Using > Retention. Select Start Val Total and set Compute Using > Date of Signup (copy)


              Note that we're using the (Copy) version, because grand total removes Date of Signup from the view and the multiple values we're seeing are a result of (Copy), so to we want to WINDOW_SUM() over all the (Copy) row.


              Also note that if we didn't use the Table Calc dial box, but set the Compute Using by right clicking the pill in the worksheet and using the Compute Using selection, Tableau would apply that addressing across all table calcs. In general, I like setting the addressing / partitioning explicitly and I like using the Table Calc dialog box.



              Now we're getting the right value, but multiple, duplicate copies of it. One for each row in the (Copy) partition. Sometimes, such as when your graphing, you want this. In a table, you can remove the duplicates with another table calc. Click Start Val Total > Create Calculated Field >


              Start Val Total First = IF FIRST() == 0 THEN [Start Val Total] END


              Double-click to add to the view. Click the pill > Edit Table Calculation. We now have three table calcs: Start Val, Start Val Total, Start Val Total First, and each one can have its own addressing. Double-check that Start Val and Start Val Total are set as above to Retention and Date of Signup (Copy), respectively. Then set the Start Val Total First to Date of Signup (Copy) as well, since again we want just the first value in the GT box.


              Now, finally, we have our denominator.



              4. Calculate the numerator

              On the Total row in the grand total pane, you have your numerator values, we just need to sum them over the Date of Signup (Copy) dimension. Create a new calculated field


              Total v2 =

              IF FIRST() == 0 THEN

                  WINDOW_SUM(COUNTD([Client ID]))



              This is the same logic as above. I combined the FIRST and WINDOW_SUM() into one, since both are computed over Date of Signup (Copy).


              Double-click to add this to the view. Set compute using to Date of Signup (copy).


              Now you should have an accurate total for each period (Total V2) and a total starting value.



              5. Calculate the % Retention

              Relative V2 = [Total V2] / [Start Val Total First]


              Since this uses table calcs that are already in the view, the partitioning/addressing will be inherited. But it's still a good practice to click the pill after adding it > Edit Table calculation and verify everything is setup properly.



              6. Cleanup.

              To get back to your original view. Just remove all but Total V2 and Relative V2 from the view.


              To remove the white space, click from the top menu Analysis > Stack Marks > Off.


              Edit the aliases to get clean Total and Relative names. I also hide the dimension column names (I don't need a column head called Month to tell men Jan, Feb, March are months, ...). Click the Header > Hide Field Labels for Rows.



              See attached workbook. Let me know if you have any questions. Neat little problem. Thanks for posting.



              • 10. Re: Cohort Analysis - How to sum and average only valid values in table?
                Jonathan Drummey

                Hi Jim,


                I just re-read this, and one thing jumped out at me. You'd written: "Tableau calculates sub totals and grand total by removing dimension "pills" to the left of what you're totaling." Isn't it it dimension pills to the right?

                For example, if I have Department, Category, and Item (in that order) on Rows, and I put a subtotal on Category, then that subtotal is effectively removing Item.


                • 11. Re: Cohort Analysis - How to sum and average only valid values in table?
                  Jim Wahl

                  Yes, you're right. I'll edit the above for posterity.


                  I was looking the rows shelf and thinking, "the dimension pills to the left of the measures." But dimensions are always to the left of measures, so that doesn't really need to be said.

                  • 12. Re: Cohort Analysis - How to sum and average only valid values in table?
                    Jonathan Drummey

                    Hi Jim,


                    I acknowledge that I'm getting very precise about language here, and I'm trying to reduce & prevent confusion. You wrote, "But dimensions are always to the left of measures, so that doesn't really need to be said." This is not the case, here's a demo using Superstore Sales:


                    Screen Shot 2014-07-21 at 6.24.51 AM.PNG


                    On Rows and Columns, blue pills (discrete fields) are always to the left of green pills (continuous fields). In the above screenshot, the discrete SUM(Sales) measure is to the left of the green Sales dimension, so measures can be to the left of dimensions. This is kind of a tangent from the original topic, it's worth noting that Tableau won't generate a row Grand Total when there is a discrete measure on Rows, and won't generate a column Grand Total when there is a discrete measure on Columns.


                    Also, Tableau can generate totals when a continuous dimension is used, here's an example from Superstore Sales:


                    Screen Shot 2014-07-21 at 6.35.40 AM.PNG


                    So, on further reflection, a more accurate statement would be "Tableau calculates subtotals for a given Shelf (Rows or Columns) by effectively removing discrete dimension pills to the right of the discrete dimension that you are subtotaling. Grand totals are calculated by effectively removing all discrete dimensions from that Shelf."


                    Does that make sense?



                    • 13. Re: Cohort Analysis - How to sum and average only valid values in table?
                      Keith Conner

                      Jim, This is awesome.


                      Thank you!

                      Keith Conner