7 Replies Latest reply on Jan 8, 2017 1:31 PM by Christina Gremore

# Return records within the 0.25 percentile?

Hi Everyone,

I've been creating box and whisker plots and want to filter out the lower 0.25 percentile of the data.

How do I use the function PERCENTILE([Sales], 0.25) for example to return all employees whose sales were in the 0.25 percentile?

Thanks!

Ben

• ###### 1. Re: Return records within the 0.25 percentile?

Hi Benjamin,

Depending on how your data is structured, you'd probably want one of the following:

SUM(Sales) >= SUM(PERCENTILE(Sales,.25))

or

[Sales] >= PERCENTILE([Sales],.25)

I think the pre-aggregated version (the first one) is more likely to give you an accurate outcome, but I'm not 100% certain. Maybe you'll need to do some stuff with an LOD to "fix" it at the 'employee' dimension? Without a packaged workbook and mockup of your data, tough to say for sure.

Then you can put this field on the filter shelf and keep True (to exclude those who fall within the 1st quartile).

-Christina

• ###### 2. Re: Return records within the 0.25 percentile?

Hi Christina,

I'm with you, I'm actually interested in creating a boolean returns true if the record is in the 0.25 percentile.

1. Im confused what Percentile([Sales], 0.25) will actually return

2. not sure why SUM(Sales) >= or [Sales] >= is a meaningful boolean.

This is what my data looks like

Thanks again!

• ###### 3. Re: Return records within the 0.25 percentile?

Hm, ok. So to be clear - the structure of your data is that there is only one row per employee, that shows a summation of their sales?

• ###### 4. Re: Return records within the 0.25 percentile?

yes

• ###### 5. Re: Return records within the 0.25 percentile?

OK, in this case, you would want the following calculation:

[Sales] <=  { PERCENTILE(Sales,.25) }

Definitely pay attention to the curly brackets there! What they are doing is fixing that percentile equation across the entire data set. If you're still not sure what it's doing, I recommend creating a separate calc with just { PERCENTILE(Sales,.25) }  in it and seeing what it outputs.

In the sample data set you provided, the 25th percentile of sales is 150. Let's look at how this calc evaluates each row of data.

Employee A: Their sales are 200. Is 200 less than or equal to 150? No, it's not. So this is FALSE.

Employee B: Their sales are 300. Is 300 less than or equal to 150? No, it's not. So this is FALSE.

Employee C: Their sales are 100. Is 100 less than or equal to 150? Yes, it is. So this is TRUE. Employee C is in the bottom quartile.

That's what makes the calculation above a meaningful boolean - because it evaluates each row of data (to wit, one employee) against the 25th percentile for the whole domain of values.

Whether you want this statement to say 'true' if they're in the bottom quartile or 'false' is just a matter of which way you point the "less than" sign. When you flip it from > to < you'll get a complementary set of results, so it's not necessarily important here.

• ###### 6. Re: Return records within the 0.25 percentile?

Thanks so much for that entire explanation!

Let me see if I got this right,

Percentile([Sales], 0.25)  will return the bottom 0.25 percentile of whatever the view is aggregated to?

{ PERCENTILE(Sales, 0.25) } is an LOD calc? that isn't fixed to a specific dimension and thus fixed to the level of the entire data set?

Example

{ FIXED [Region] : SUM([Sales]) }  -- sum of sales per region irrespective of the view?

{SUM([Sales])} - sum of sale per entire data set irrespective of the view?

THANKS!

Ben

• ###### 7. Re: Return records within the 0.25 percentile?

Yup, you got it!