5 Replies Latest reply on Dec 14, 2012 6:42 AM by Jonathan Drummey

# Creating a Dynamic Histogram

Hi All

I've always struggled with the Histogram in Tableau.

I prefer to be able to see the shape of the data that I am trying to analyse -  and the "Create Bins" method is not dynamic enough to be able to change the number of bins, or group outliers together.  In addition, because the bins are dimensions I've never found a satisfactory way of putting on the median / mean / standard deviation in a graphical format.

So I have attempted to create a dynamic histogram. See workbook attached.

• You can select the measure you are looking at from a parameter (set for Sales or Profit at the moment).
• Select the max and min to group outlying data - to stop long tails. (I would quite like to get this to be selectable to be a % or value)
• Select number of bins to view the data at different intervals.

1.     The Calculations are very "clunky" - and with large datasets I'm sure will grind to a halt -  is there a better way?

2.     How can  I calculate the median and mean (of the underlying data) to display it as reference lines on the distribution? Table Calcs ?  I've put an example of what I am looking for :

All the best

Ian

ps: I've been creating this workbook to have some working examples of different graph types (as an aide memoir) rather than perfect examples .  I've left the Report worksheet attached in case anyone else is interested in looking at or using it. I would welcome any comments or suggestions.

• ###### 1. Re: Creating a Dynamic Histogram

Hi Ian,

You've run into something I've been thinking about writing a blog post on, the title of which is something to the effect of "when reference lines go bad."

What's happening is that by default, Tableau computes the reference lines based on the displayed marks in the view, so the Mean and Median are computed based on the bars of the histogram, not the underlying data. If you change your reference lines to use the Total aggregation instead of Average, you will get the correct results. Total (and it's counterpart table calc TOTAL) is special table calc that calculates the given sub-calc across the entire partition of data. So, given the Scope setting of "Entire Table", the Total aggregation of the Median of the Selected Measure is operating across the entire table, whereas the Avg aggregation (which is like the WINDOW_AVG calc) is averaging out all the Median values.

I've set these up on the Total and Window Avg worksheet in the attached, and a revised histogram in the Histogram using Total. BTW, this still trips me up enough that I'll often create a view like your Calculation Check and see what the expected results are, then mess around with the aggregations for the reference line until those match.

- On the profit histogram, make the color range red to black instead of red to green. This way people who are red-green color blind will be able to see the difference, since there are no other indicators in the view.

- For the Cohort Sales worksheet, as an alternative to creating two calcs specifically for East (and potentially more calcs for other members of the Region Dimension), you can create two generic calcs. One is a LOOKUP() on Region, the other is a TOTAL(SUM([Sales])). Then you can put Region in the view, use a Sum Sales and % of Total along with the Total Sales, put the LOOKUP calc on the Filters shelf, set the Compute Using for the calcs to Region and you now have a dynamic selection of region with totals. The disadvantage to this method is that it requires more computation on Tableau's end and could introduce performance issues, but in a case like this should be quite fast. I set this up in the Region Sales for Cohort view.

Jonathan

1 of 1 people found this helpful
• ###### 2. Re: Creating a Dynamic Histogram

As for the binning calcs, I think there's probably a way to optimize them, I don't have the time to go into that right now. I know Joe Mako has done some work on this.

• ###### 3. Re: Creating a Dynamic Histogram

Hi Jonathan

Firstly - I liked your "I got a Rock" blog.

That works great  - do you know if it works with standard deviation as well? I guess I would need to do a calculated field off it for the +1, -1 etc standard devs to use for the reference lines -as the distribution references will return incorrect data.

Does the TOTAL ignore what is displayed and look at the underlying data - because I've changed it so that they group together in bins (especially at the ends) - the calc's seem to stack up so I'm assuming it does.

Since posting I've been trying to unpick the Maestro Mako's workbook from this thread http://community.tableau.com/message/180484  to get a better way of calculating the bins.

Best regards

Ian

• ###### 4. Re: Creating a Dynamic Histogram

Thanks to Jonathan for pointing me in the correct direction for the Reference Lines.

I'm still working on trying to improve the "clunky" calculation for binning the measures.

To close off this thread attached is the corrected workbook - with Mean, Median and Standard Deviations overlaid on the Histogram.

In Addition, I've taken Jonathan's comments onboard regarding weaning me off of my preference for "Red / Green" on my Report practice - and added in an additional indicator for good measure (and excellent practice)  as per this old thread http://community.tableau.com/thread/111652

Modification on the Cohort Caculation gratefully received - I did try and create an action filter on the dashboard when selecting a region however this filters the data that the TOTAL calc is referencing so a region then has 100% of the sales - which made me understand/appreciate the LOOKUP(ATTR([Region]),0) used as a filter.

The next thing I want to do is to overlay a normal distribution curve  - but I guess that will be best handled under another question.

Screenshots of results :

1 of 1 people found this helpful
• ###### 5. Re: Creating a Dynamic Histogram

There is no Total aggregation available for SD reference lines, as it seems you found out.

I'm still figuring out how to describe what TOTAL does. It's a table calc that doesn't care about sorting or addressing. You just give it set of partitions and it evaluates the inner calculation across that entire partition. So it will ignore what is displayed if you tell it to via the Compute Using, etc. settings.

Jonathan