6 Replies Latest reply on Oct 4, 2016 11:59 AM by Jonathan Drummey

# Getting Incorrect Percentiles Calculation

HI again,

I am working on this data set and trying to determine the  average discount  per product category which fall between 5th and 95th percentile range for a particular quarter. I am able to get the calculation in tableau but it seems to be wrong when I compare it to the excel calculations. Am I doing something wrong with the calculated field ? Is there any easier way of doing this?. Attached workbook.

Regards,

Sudheesh Rao

• ###### 1. Re: Getting Incorrect Percentiles Calculation

Hi Sudheesh,

Put context filter on your [Quarter of Order Created Date] dimension and check. Let us know If you've any query. If you still not get the correct result then share your expected output.

Mahfooj

• ###### 2. Re: Getting Incorrect Percentiles Calculation

Thanks for your response. I have tweaked Discount (Percentile) formula to just get 0 to 95th percentile for the ease of explanation.

Assuming that Tableau percentile uses Percentile Inclusive function, I have attached the excel sheet which has the expected results with calculations for product 'Alienware'. The percentile marker should be at 64.79% which is matching with my tableau worksheet.The average of Discount however is 1.17% higher than my calculation.

Regards,

Sudheesh

• ###### 3. Re: Getting Incorrect Percentiles Calculation

There are at least three issues, two of which are affecting your results:

1) The Discount (Percentile) formula in the attached workbook and the formula you have written in your most recent post both exclude values below the 5th percentile, however the Excel formula does *not* exclude them. That biases the Tableau average to be higher.

2) The Excel calculation is doing =AVERAGE(D2:D965,K9) so it is including an extra value (K9) that is not in the Tableau calculation which only counts the records in the data. So the Tableau calculation is effectively doing =AVERAGE(D2:D965) and that returns 48.0215, not the 48.0389.

3) The LOD expression is quite a bit more complicated than it needs to be. All we need is:

IF [Discount Percent] < {INCLUDE : PERCENTILE([Discount Percent], .95)}

// commented out to match Excel calc

// AND [Discount Percent] > {INCLUDE : PERCENTILE([Discount Percent], .05)}

THEN

[Discount Percent]

END

There are two not-so-commonly used features of this calculation.  a) Although we *must* aggregate the results of an INCLUDE or EXCLUDE function in the view, we can use them inside calculations at the record level or aggregation as we desire. b) Using INCLUDE : with no dimension declaration tells Tableau to aggregate the given expression at the vizLOD and return the result as a record-level value. A useful byproduct of using INCLUDE is that we don't need to use context filters to affect those LOD results, when the calc is written using FIXED then we absolutely need to either ensure that the FIXED is appropriately aggregating along the dimensions we expect it to (and not more) and/or or use context filters as Mahfooj had described. A challenge here is that this calculation only works when the vizLOD has the desired level, for example it won't return the same results if Order ID is a dimension in the view because then the view is at too fine a grain.

In any case this is then aggregated with the AVG() aggregation in the view at the level of detail of Product:

The result for Alienware is 48.0215. If I use =AVERAGE(D2:D965) in Excel then I also get a result of 48.0215:

I'm not sure why the Excel calculation is adding an extra value to the average? It seems to me that since the problem definition is to get the average of the data that fall between the 5th and 95th percentile that adding an extra value is inaccurately biasing the average. The Tableau calculation is only computing on the data.

Jonathan

3 of 3 people found this helpful
• ###### 4. Re: Getting Incorrect Percentiles Calculation

Hi Jonathan,

Thank you for your reply. Extra value K9 was indeed one of the issues and your explanation works like a dream in the present format:) Thank you so much.

However, The reason for the use of below formula was because I wish to plot the data falling between 0 to 95th percentile on Box and whisker plot. With your suggested formula I will not be able to control the data.

Now I realized that the dimension I was using was wrong may be in the above formula. Changing the formula to aggregate on the order Id should work but the average discount percent is coming to 48.29 and not 48.02.

{

FIXED [Order ID]: AVG(

IF {FIXED [Order ID] : AVG([Discount Percent])} <

{FIXED : PERCENTILE({FIXED [Order ID] : AVG([Discount Percent])}, 0.95)}

AND {FIXED [Order ID] : AVG([Discount Percent])} >

{FIXED : PERCENTILE({FIXED [Order ID] : AVG([Discount Percent])}, 0.00)} THEN

[Discount Percent]

END)

}

The intent is to control the number of data points i.e., data points falling between 0 to 95th percentile discount range and then do the average of discount. And I want to achieve this with Box and Whisker.

Regards,

Sudheesh Rao

• ###### 5. Re: Getting Incorrect Percentiles Calculation

I got this one working correctly. Instead of embedding the criteria in the formula I created a filter with following calculation.

{FIXED [Order_NUMBER] : AVG([DISCOUNT_PERCENT])} <

{FIXED : PERCENTILE({FIXED [Order_NUMBER] : AVG([DISCOUNT_PERCENT])}, 0.95)}

AND {FIXED [Order_NUMBER] : AVG([DISCOUNT_PERCENT])} >

{FIXED : PERCENTILE({FIXED [Order_NUMBER] : AVG([DISCOUNT_PERCENT])}, 0.05)}

This filters out the data that doesn't need to be part of the computation.