Well I can't say it was easy, but I managed to show the outlier points in the attached.
Having played with bandlines a bit, I'm not sure the outlier points are critical (with some datasets, anyway).
Few doesn't use them in his example, albeit the shaded bands in the grade book dashboard are not based on box-plot-style quartiles. And the example by XLCubed he recently cited uses points for min/max, not outliers.
And, in your Tableau workbook, you calculated the reference bands based on the panes, in which case the midrange is always present and provides context for the line's distance into Q1 or Q4.
If you instead calculated the reference bands for the table, it's possible that---with a different dataset---some of the panes would have values only in one or two quartiles. In this case the outlier points help distinguish otherwise identical bandlines.
For example if you had data with quartiles at 45, 50, and 55, then these two, five-point datasets would look identical:
- 60, 61, 62, 63, 64 and
- 60, 65, 70, 75, 80.
The outlier points on 70, 75, and 80 would highlight the difference in y-axis scale.
Of course, you could just add high / low text labels.
I showed a few different examples bandlines, bandlines with sparkstrips (Few's horizontal box plots, next to the bandlines), and a more traditional sparkline. Everything was easy, except for the outlier points.
Bandlines_jimw.zip 532.9 KB
1 of 1 people found this helpful
Thanks so much Jim. Your explanations and calculations were very clear and helped me past the logic blocks I was having. In particular, I would never have thought to use the index and size functions to determine where the 75% and 25% positions were and then just take the average of the two numbers that border that position. Really elegant.
In reviewing your logic, though, a few more questions emerged, primarily around the syntax you used. Some of it seems extraneous to me (i.e. I was able to get the same results with simpler syntax), so I’d like to better understand its purpose. An example is below.
Your calculation for Sales (75th percentile) is:
IF FIRST() == 0 THEN
IF 0.75 * SIZE() = INDEX() THEN
(SUM([Sales]) + LOOKUP(SUM([Sales]), 1)) / 2
ELSEIF 0.75 * SIZE() < INDEX() THEN
IIF(FIRST()==0, LAST(), 0)
I was able to achieve the same results with the following calculation:
window_min(IF 0.75 * SIZE() = INDEX() THEN
(SUM([Sales]) + LOOKUP(SUM([Sales]), 1)) / 2
So it isn’t clear to me what the portions in blue are doing.
Another question I have is about your calculation for the lower outlier threshold. Per Few’s description ("A distance of 1.5 times the midspread above the 75th percentile and below the 25th percentile"), this would mean that the lower threshold should also be 1.5x the same $36,557 IQR, which would put it at $4,171 vs. the $22,469 you calculated. Am I interpreting his description incorrectly?
I still intend to look through the rest of your tabs, but I wanted to reply to the core items and to let you know how much I appreciate your thorough examples.
Glad it helped. I'd be interested to hear how well bandlines work for you.
The IF FIRST()==0 and IIF(FIRST()==0, LAST(), 0) are part of Richard Leeke's original percentile solution to optimize the performance. He explained it well here (http://goo.gl/Ww0eA); as I understand it, the WINDOW_xxx functions are slow, because they (in this example) repeat the sort for every partition. Since the answer is the same for all partitions, it makes sense to just run it on the first partition (FIRST()=0).
But Tableau apparently evaluates the WINDOW_MIN() part first and, therefore, the FIRST() trick doesn't improve the speed---Tableau calculates, but ignores the subsequent results. So Richard's second trick was to specify a single-value range for all but the first partition using the IFF(FIRST()==0, LAST(), 0)---the resulting window calculation is meaningless, but it's being ignored.
Performance isn't an issue with this example, but FIRST()==0 also solves the overlapping text problem, if you wanted to include the value in a table.
The ELSEIF 0.75 * SIZE() < INDEX() THEN SUM([Sales]) part is required if the percentile is one of the values in the data set. For example, when calculating the median of a dataset with 11 values.
- Dataset A with 10 values: 1 2 3 4 5 6 7 8 9 10 --- median is 5.5
- Dataset B with 11 values: 1 2 3 4 5 6 7 8 9 10 11 --- median is 6
Another question I have is about your calculation for the lower outlier threshold. Per Few’s description ("A distance of 1.5 times the midspread above the 75th percentile and below the 25th percentile"), this would mean that the lower threshold should also be 1.5x the same $36,557 IQR, which would put it at $4,171 vs. the $22,469 you calculated.
Good catch, thanks, it should be $4,171.
I prettied up the workbook above and put the bandlines on Tableau public here:
Stephen Few also picked up the fact Tableau doesn't allow you to put the median reference line in the background, which causes breaks in the sparkline. His comment is here: http://www.perceptualedge.com/blog/?p=1485