To get and answer to your question, I suggest you to share a sample data set to replicate your results and find the reason of the difference you have found..
I have attached the sample data (twbx & excel) which shows the results I'm getting.
Thanks for you help!
Jim Wahl has written about Percentiles lately, and I wonder if he might have a clue as to why Tableau is computing this differently. I tried several approaches, but could never get the values to match your Excel file. I assume Excel is computing the Percentile using a different method, but I'm not exactly sure.
Hope to see a solution from someone soon!
1 of 1 people found this helpful
Look at the difference between Excel's PERCENTILE.EXC and PERCENTILE.INC functions
I believe PERCENTILE is the Inclusive type (INC).
In essence, where there are 'ties' and the percentile breakpoints do not perfectly split the data, percentile calculations have a degree of ambiguity that depends on how you count and use the 'last' and 'next' value around the percentile split.
FYI your table in the posting and the EXCEL file do not agree exactly. See Group C
Thanks - I'm hoping someone has been able to replicate the PERCENTILE.INC function in Tableau
Is there a trick to using Percentile.EXC in Excel? I'm unable to use it--did it show up in 2010?
Microsoft documentation says EXCEL 2010. I believe a large number of extra .xxx functions were added in 2010 to fix problems with various EXCEL statistics calculations and handle ambiguities like this one (Google: "Friends don't let friends do statistics in EXCEL").
But EXCEL's PERCENTILE calculation might be wrong (or at least unconventional)!
Not sure if Tableau is right, but I'd trust it more. There are numerous web postings about EXCEL's problems in computing statistics. I suggest you work out by hand one of your cases where EXCEL & Tableau differ and decide which one you want. The differences are typically from a different denominator ( /n or /(n-1) ) or an interpolation formula)
I was playing with your data in Excel, Tableau and Alteryx. I was able to get the same results in Alteryx using the Percentile function (it seems is the same implementation than Excel).
Definitively Tableau uses a method to compute Percentile that is efferent to the method used by Excel.
In Alteryx, we are able to select at least two methods to compute Percentiles: Percentile and PctNo0 providing different results.
Percentaile page in Wikipedia presents information about methods to compute Percentiles, including the implementation in Excel.
I've not fund the method used by Tableau so far.
This falls into the category of one of those "Things that I learned that I've never had time to fully detail and write up." There are 4 areas where Tableau computes percentiles (PERCENTILE, WINDOW_PERCENTILE, RANK_PERCENTILE, and reference lines/bands/box plots). I've got an email somewhere with details on the algorithms, Tableau actually uses a couple of different algorithms in its functions so results of different functions aren't necessarily convertible, and don't necessarily match what we'd see coming out of Excel. In large samples (like 1K values or more) the results of the different functions will converge, but in smaller sample sizes they can be wildly different as you found.
WINDOW_PERCENTILE will get you the Excel style percentile if you disable aggregate measures. This last part is critical and probably not something most people realize they need to do with WINDOW_ functions based on measures.
Here's the formula:
Days P95 TC =
IF FIRST() == 0 THEN WINDOW_PERCENTILE(ATTR([Days]), .95) END
ATTR() is required because table calcs, such as WINDOW_XXX, require aggregates. ATTR() is an aggregation where if MIN(x) == MAX(x) you get x, else you get "*". IF FIRST() == 0 is used to return the value for just the first row of of the Days partition (the calc is the same for all rows and in a table you just want one value).
If you just put this calc in the view as I did below, you'll get NULL values because this view is at the category level of detail and there are multiple Days values for each category and, therefore, ATTR([Days]) is "*".
My next thought was to drag Days to the Detail button. Since days is a measure in your workbook, Tableau will add this as SUM(Days). This, of course, is not helpful, so you think you need to convert this a dimension. This gets you deceptively close (after adding, click the Days P95 TC field and change Compute Using to Days and then from the top menu bar select Analysis > Stack Marks > Off):
But it's not quite right. P95 for category B should be 177.9 and above it's calculated as 179.8. The problem is that by default Tableau aggregates measure values. Category B consists of 10 values: 30, 53, 55, 93, 114, 114, 145, 152, 157, 195. Since two of these values are the same, when you convert Days to a dimension, Tableau returns 9 values not 10.
You can see this by dragging Days on top of Days P95 TC (again, from the top menu select Analysis > Stack Marks > On).
To get the complete set of Days values, you need to deselect Aggregate Measures from the top Analysis menu:
Now you can drag Days P95 TC back to the view to get the correct WINDOWS_PERCENTILE() values that also mach the Excel algorithm for percentile:
TLDR --- Tableau's Percentile vs Windows_Percentile ---
For comparison here are your original values using the built-in percentile aggregation:
As others have said, there is no standard for percentile -- R's percentile function has 9 methods (3 for discrete values, 6 for continuous), but I bet you could find another half dozen methods in the various R packages --- even summary() is slightly different.
The Excel method (R Type 7) for your example, where you want the 95th percentile with 10 values is
the h value in the data for p95 = (10-1)*.95 + 1 = 9.55
Since h is a decimal, you use linear interpolation: take the 9th value + .55*(10th - 9th) = 157 + .55*(195-157) = 177.9.
The Tableau formula (appears) to use proportional interpolation: 157 + .95*(195-157) = 193.1. (This is not one of the 9 R types, and doesn't really make sense to me, but it's hard to say it's wrong when there is no standard.)
Tableau Reference Lines Percentile
Tableau uses a third method for reference lines, which is equivalent to R type 2 for discontinuous values. Where the h value is Np + 1/2. In your example this is 10*.95 + 1/2 = 10, so the 95th percentile is also the max value of 195.
For group A, which has 26 values, h is 26*.95 + 1/2 = 25.2. Using this formula if the decimal is < .5 you round down. If it's > .5 you use the next value. If == 0.5, then you take the midpoint. For group A, you want the 25th value, which is 228.
One nice thing about this algorithm is that the percentile is usually a measure value (unless it's at the midpoint).
PS --- I used Tableau 8.2 for your workbook. If you're on 8.1, you'll get an error, but you should be able to work through the above manually. Let me know.
P95 Analysis_jimw.twbx 51.1 KB
Thanks for this very informative post about Percentile Calculation in Tableau.
Awesome! I got to the "deceptively close" result, but didn't realize aggregation was the problem. Thanks for the thorough explanation, as always, Jim!
Incredible work, Jim!
Any thoughts or direction regarding how I could display three calculated (rounded) percentiles on the same screen as all values?
This would be similar, visually, to your third example, with the following differences:
- There would be at 3 percentile circles/bars per row (or in my case, column)
- I'll be calculating these so I can round them up or down, rather than relying on Tableau's built-in reference line feature
I can do this using two graphs (one aggregated to show percentiles, the other non-aggregated to show all values); I just can't "overlay" them.
Below is a snapshot from a dashboard. The upper graph shows the aggregated view of results along with points marking the 5%, 50%, and 95% calculated-then-rounded percentiles.
The lower Gantt-chart showcases the actual values in a non-aggregated view (the bands you see are created using the reference lines).
Thoughts? Kindly let me know if you would like me to present this as an outright question, or continue my digging