8 Replies Latest reply on Nov 6, 2017 12:42 PM by DK Elly

# Filtering out Top 5% records and Bottom 5% records

Hi all,

I am trying to do some statistical analysis here, and I want to remove the outliers. How to I remove records with the top 5% value and bottom 5% value?

Thanks,

Alan

• ###### 1. Re: Filtering out Top 5% records and Bottom 5% records

One way to do it is to use table calculations. See attached workbook for a solution - top and bottom 5% of products are filtered out based on their sales values. Note the partitioning of table calculations, it is important.

The approach is as follows:

1. Calculate sales order (rank in product list sorted by sales) using INDEX()
2. Calculate total number of products in the list using TOTAL(COUNTD([Product Name]))
3. Create a calculation to establish if sales order is lower than 5% or higher than 95% of product count
4. Use this calculation to filter the sheet

If you wanted to remove products with sales are within 5% of minimum or maximum sale value, similar approach can be used.

• ###### 2. Re: Filtering out Top 5% records and Bottom 5% records

Wow thats awesome Dimitri!

What if I want top and bottom 5% of the underlying data?

In your example, say for the product "Alliance Rubber Bands", if there were multiple sales (look below)

Product                                   Customer               Sales

Alliance Rubber Bands               a                           300

Alliance Rubber Bands               b                           1000

Alliance Rubber Bands               b                            400

Alliance Rubber Bands               e                              1

In this case, 1000 and 1 are the outliers, how do I filter them out in the calculations, so that I have my total without outliers added.

Looking for Output:

Product                                    Sales

Alliance Rubber Bands               700

Not sure if I explained it clearly, willing to provide more explaination. Thanks.

• ###### 3. Re: Filtering out Top 5% records and Bottom 5% records

Hi Alan,

I am on Easter break (away from Tableau) until Wednesday, I will look into it when I get back into office.

• ###### 4. Re: Filtering out Top 5% records and Bottom 5% records

This is definitely possible, but depends on your planned sheet layout. The approach would be to add a dimension that identifies each transactions, in this case Order ID. The partitioning will need to be adjusted so that sales amount for each order (as opposed to sum of sales from all orders for the product) is ranked and compared to the rest. Let me know if you want an example with the sample data.

• ###### 5. Re: Filtering out Top 5% records and Bottom 5% records

Can a sample be provided for the latest example Dimitri mentioned?

• ###### 6. Re: Filtering out Top 5% records and Bottom 5% records

Here it is, see the A&O sheet, which uses fake Apples & Oranges data set.

Similar concept, but sales for each transaction (row) are evaluated individually, as opposed to sum of sales for a bunch of rows in the other sheets.

• ###### 7. Re: Filtering out Top 5% records and Bottom 5% records

To add to this, as we did have a customer request this, here is how to remove the outlier data and get a new average of sales:

• ###### 8. Re: Filtering out Top 5% records and Bottom 5% records

Version: 10.3.0

Packaged workbook attached

I was trying to recreate the Outlier calcs using equipment builds in number of days VS sales/products used in the previous examples.  For some reason I cannot get the evaluation to work based on the OS dimension and could use some helping hands and eyes on what I must be missing.  Thanks