1 of 1 people found this helpful
You've run into something I've been thinking about writing a blog post on, the title of which is something to the effect of "when reference lines go bad."
What's happening is that by default, Tableau computes the reference lines based on the displayed marks in the view, so the Mean and Median are computed based on the bars of the histogram, not the underlying data. If you change your reference lines to use the Total aggregation instead of Average, you will get the correct results. Total (and it's counterpart table calc TOTAL) is special table calc that calculates the given sub-calc across the entire partition of data. So, given the Scope setting of "Entire Table", the Total aggregation of the Median of the Selected Measure is operating across the entire table, whereas the Avg aggregation (which is like the WINDOW_AVG calc) is averaging out all the Median values.
I've set these up on the Total and Window Avg worksheet in the attached, and a revised histogram in the Histogram using Total. BTW, this still trips me up enough that I'll often create a view like your Calculation Check and see what the expected results are, then mess around with the aggregations for the reference line until those match.
I like your workbook! A couple of comments:
- On the profit histogram, make the color range red to black instead of red to green. This way people who are red-green color blind will be able to see the difference, since there are no other indicators in the view.
- For the Cohort Sales worksheet, as an alternative to creating two calcs specifically for East (and potentially more calcs for other members of the Region Dimension), you can create two generic calcs. One is a LOOKUP() on Region, the other is a TOTAL(SUM([Sales])). Then you can put Region in the view, use a Sum Sales and % of Total along with the Total Sales, put the LOOKUP calc on the Filters shelf, set the Compute Using for the calcs to Region and you now have a dynamic selection of region with totals. The disadvantage to this method is that it requires more computation on Tableau's end and could introduce performance issues, but in a case like this should be quite fast. I set this up in the Region Sales for Cohort view.
Firstly - I liked your "I got a Rock" blog.
That works great - do you know if it works with standard deviation as well? I guess I would need to do a calculated field off it for the +1, -1 etc standard devs to use for the reference lines -as the distribution references will return incorrect data.
Does the TOTAL ignore what is displayed and look at the underlying data - because I've changed it so that they group together in bins (especially at the ends) - the calc's seem to stack up so I'm assuming it does.
Since posting I've been trying to unpick the Maestro Mako's workbook from this thread http://community.tableau.com/message/180484 to get a better way of calculating the bins.
Thanks for your help
1 of 1 people found this helpful
Thanks to Jonathan for pointing me in the correct direction for the Reference Lines.
I'm still working on trying to improve the "clunky" calculation for binning the measures.
To close off this thread attached is the corrected workbook - with Mean, Median and Standard Deviations overlaid on the Histogram.
In Addition, I've taken Jonathan's comments onboard regarding weaning me off of my preference for "Red / Green" on my Report practice - and added in an additional indicator for good measure (and excellent practice) as per this old thread http://community.tableau.com/thread/111652
Modification on the Cohort Caculation gratefully received - I did try and create an action filter on the dashboard when selecting a region however this filters the data that the TOTAL calc is referencing so a region then has 100% of the sales - which made me understand/appreciate the LOOKUP(ATTR([Region]),0) used as a filter.
The next thing I want to do is to overlay a normal distribution curve - but I guess that will be best handled under another question.
Screenshots of results :
There is no Total aggregation available for SD reference lines, as it seems you found out.
I'm still figuring out how to describe what TOTAL does. It's a table calc that doesn't care about sorting or addressing. You just give it set of partitions and it evaluates the inner calculation across that entire partition. So it will ignore what is displayed if you tell it to via the Compute Using, etc. settings.