That's a very good and valid question. The best way to explain this step by step is to show indexing.
Your calculation is set up as follows:
meaning it is partitioned by [State] so we may consider each State independent* (not really as it turned out...)
If you enter "edit in shelf" mode you will see that your calculation is in fact the following:
ZN(SUM([Profit])) - LOOKUP(ZN(SUM([Profit])), -1)
and I added the LOOKUP part alongside index addressed and sorted in the exact same way so that it is easy to trace what is going on.
In case of Alabama we have:
Order of indexing the table calculation is primarily determined by SUM(Number of Records), then by [Segment] and lastly by [Year of Order Date]; thus the first (blue) column is indexed at position 9 and its previous value happens to be Corporate/2014 column; indexes 11 and 12 are neighboring columns in your sheet. Now why is Consumer/2013 indexed lower than Consumer 2014 while it has 10 records and the latter only 3? Well - that has surprised me! - it turns out that sort order is determined ignoring partitions (?) so as if for grand total. Notice that indexes are uniform in all the Sates and the calculations change completely when you filter all states but Alabama:
Here you can see another factor: with a cross table with all the states we had domain padded for all potential (4) years. To preserve this layout we can add a table calculation - e.g. size() - in details shelf just for this purpose:
Now we have indexes 11 & 12 retained although they should be numbered 1 & 2 - I will come back to this later on...
It is good to check Grand Totals so that it becomes clear at which level Number or Records is considered having all the states in view:
In order to have this calculation sorted in each state independently we may create a calculated field [Δ Profit] - as quick table calculations are missing "restarting every" option - with addressing like this;
I have added another INDEX() addressed as above to show the difference:
Now each state has its own sort order but we ended up with columns where Alabama has no records indexed as highest numbers which is independent from ascending or descending order. Even though I introduced a calculated aggregation ZN(SUM(Number of Records)) and used that instead it did not affect indexing as if the value still was NULL which is always sorted this way as it cannot be compared with existing values.
Jonathan Drummey is an expert in table calculations nuances and maybe would like to unveil the mystery here, namely why this custom sorting does not seem to use ZNed values?
Advanced_TableCalcs_9.3.twbx 1.2 MB
3 of 3 people found this helpful
The easiest way I know of to think about having multiple dimensions in addressing and using a Custom Sort is that we get a *nested* sort of the combination of addressing dimensions. This is a separate sort from whatever sort may be on the pills in the View. So sorting Segment and Order Date by SUM(Number of Records) is going to sort the combination of distinct values of those two dimensions. (It's a little different from what Lukasz described).
Also part of the challenge with table calcs is the number of factors involved - the relative sparseness of our data, Tableau's 8-ish different types of data densification, Tableau's order of operations, the calculations themselves, their addressing and partitioning settings, etc.
Usually how I start out views for table calcs is to create a view that has all my dimensions as discrete pills on Rows with partitioning views on the left and addressing dimensions on the right. Then I can add an INDEX() calculation with the same Compute Using as the problematic pill and see *exactly* how it is addressing:
Note that Alabama in this view only has 10 combinations of Segment & Year, while in the crosstab view Alabama has 12. Also this view has 1419 marks, the crosstab view has 1764. This is because of Tableau's data densification behavior which is triggered by a number of different factors, in this case a crosstab arrangement of discrete dimensions on Columns & Rows and a table calculation with the right addressing. In this view I've added that same INDEX() calculation with the same compute using as the Profit quick table calculation as a discrete pill on Columns to show exactly how the state partitions are being sorted:
There are now 12 combinations of Segment & Year for Alabama, Tableau has padded in Home Office/2011 and 2012.
This makes a nice segue into talking about custom sorts again: If I do a custom sort on SUM(Number of Records) ascending with the Addressing on Segment & Year (and changed the Profit difference quick table calculation to match) the view looks a little crazy:
There are two things going on here: one is that the custom sort is sorting Segment & Year across *all* states, not within any state. (This where Lukasz's tip of using Grand Totals is helpful, unfortunately we can't turn on Grand Totals and see the INDEX() as a discrete pill on Columns or Rows at the same time due to a limitation of Tableau).
The second is that with this pill layout we've triggered data densification.
Lukasz's question around ZN(SUM([Number of Records])) with a custom sort is an effect of Tableau's order of operations. The custom sort in an Advanced Table Calculation uses regular aggregates that are coming from the data source, this is *prior* to data densification so those missing values such as Home Office/2011 and 2012 don't exist (yet) and therefore aren't dimensions that get sorted. What makes this confusing is that the ZN() applies to the Null values for SUM(Sales) for Home Office/2011 and 2012 that have been padded in by data densification, so even though we write the calculation as ZN(SUM([Number of Records])) the inner SUM() is computed in the data source and the outer ZN() is applied after densification. So the ZN() looks like it should work but it won't. Depending on exactly what the goal is for the view we can sometimes work around this, sometimes we can't.
And now let's circle back around to the filtering issue where if we just filter for Alabama then those two values for Home Office/2011 and 2012 go away. This is because data densification only works with the dimensions that are returned in the query from the data source, so if we get rid of those combinations by filtering then Tableau won't pad them in. We can work around this some with the Show Empty Rows/Columns settings, or using a table calculation filter like LOOKUP(MIN([State]),0) but these both have limitations.
With all these complexities I'm very careful about when I'll use data densification in production views, much more often I'll pad out the data in advance using cross product queries or the "join on 1" technique or a data prep tool such as Alteryx. Then I have record-level values for all combinations in the data that I need and I don't have to worry (too much) about Tableau's data densification behaviors, issues around sorting partitions, etc.
I've attached the workbook I used to make these screenshots.
Jonathan - as always - you provided a clear and in depth explanation. I learned two things today:
- custom sort is sorting Segment & Year across *all* states, not within any state
- the inner SUM() is computed in the data source and the outer ZN() is applied after densification. So the ZN() looks like it should work but it won't.
You are not going to find this anywhere in Tableau Online Help and for me it is always "try and see" type of knowledge.
Any chance you share/issue a real documentation of these complex and vague topics (I know some are covered in your blog)? I think you and Joe Mako might be working on something...?
Thanks a lot,
1 of 1 people found this helpful
You’re right, it is “try and see”…even for folks like me who know it!
Joe and I were working on a book, and between our perfectionism and other pursuits and job changes (plus the topic being a moving target as Tableau builds out new functionality) plus one more factor I’ll talk about in the next paragraph we never got it to a point of release.
I wrote in the prior post about some of the factors that affect table calculations, Joe and I at one point had a conversation where we tried to identify the critical bits of knowledge to fully understand table calculations and what we came up with was “basically everything”: The 4 pill types, the 4 levels of calculation, data structure & relationships between fields & sparseness of data, grain of data vs. vizLOD, order of operations, writing logical calculations, the 13+ different kinds of sorting in Tableau views (yes, I’m that much of a #tableaugeek), idiosyncrasies of each mark type, the 8 types of data densification, data blending, what version of Tableau we’re using, etc. So I’ve (slowly) been building out Tableau trainings that go into those foundations in more detail with the long-term goal of then being able to cover table calculations in a way that people can have the foundational knowledge that supports building out table calculations that (mostly) work the first time, every time.
Joe and I (and others) have told the Tableau devs that even as we find table calculations are extraordinarily powerful we also believe that table calculations are too complicated, there are just too many factors to keep track of (even for folks like Joe and me who can reasonably claim to understand them). We are sure there’s got to be a "better way", and Tableau thinks so as well. A few examples: Level of Detail (LOD) expressions are one outcome of Tableau’s research and development in this area, they let us do a number of things that were previously very difficult or impossible for table calculations. Another result of that R&D is the “drag a reference line to the Filters Shelf” feature that was demoed at #data15 (tho not yet released) and a third that’s in the 10.1 beta is the way Tableau automatically generates LOD expressions for measures in JSON sources. Tableau R&D really believes in “no feature before it’s time” in the sense that they want to make sure they’ve thought deeply about a feature and its impact on the user experience across the product and that a better solution will solve multiple problems at the same time. Even a feature like cross data source joins can be viewed as part of Tableau’s efforts to make calculations easier because the record-level join lets us avoid the complexities of data blending (which could require require table calculations to solve some questions since data blends are a kind of post-aggregate join) while giving us the full functionality of record-level values (and then we can use LOD expressions to deal with multiple grains of data if we have replicated values).
Lukasz / Jonathan,
Thank you both for the thorough explanation and the documentation to explain what's happening in this example. This has been extremely helpful and it's made me realize that I've just been scratching the surface when in comes to Table Calcs.
The primary lesson I've taken away is that custom sorting in a Table Calc is independent of the pills on the view - it sorts on the possible combinations of the addressing dimensions... and that helps me 'locate' the previous profit being picked up by the profit difference Table Calc. However I found it very helpful to see how domain padding can impact the view. I also never realized that Quick Table Calcs did not allow for the 'Restarting Every' option - I guess it's obvious I don't use the 'Restarting Every' option that often.
Your detailed responses have also offered another level of teaching... that is a best practice related to creating and debugging Table Calcs. Lukasz's method of editing in shelf to add the different Table Calc components to the view was a great idea, and Jonathan's method of creating a view with the 'dimensions as discrete pills on Rows with partitioning views on the left and addressing dimensions on the right' with an INDEX function is very instructive to see how the Table Calc is working. These will be practices I incorporate in my creation of Table Calcs going forward.
Again, thank you both for your assistance and taking the time to provide such a thorough explanation of the problem.