6 Replies Latest reply on Feb 7, 2019 8:36 AM by Yul Beauchamps

    Filter by Rank or Percentile

    Yul Beauchamps

      Hi folks,


      I'm trying to get rid of some outliers in my data.  I'm using live data for this so by default I can't use quick table calculations for percentile (or median).  Since my attached workbook contains an extract I added the 75th percentile for display purposes only.  That being said it's not quite what I'm after.  With percentile in tells me that 75% of records have a time delay of 5 minutes or less.  What I really want to do is get the average time delay after I have already trimmed out those really high outliers.  I thought I could create a percentile rank calculated field and then filter out all the values above a certain percentile.  However, Tableau seems to require you to aggregate the measure which doesn't seem to make sense to me.  Maybe I'm better off doing this through SQL as a VIEW/Proc...


      Any Ideas? 




        • 1. Re: Filter by Rank or Percentile
          Joe Oppelt

          I don't understand.


          I have this workbook open.  What are you trying to filter out?  Given what's on this sheet, what did you want to do and what do you want to see as a result?

          • 2. Re: Filter by Rank or Percentile
            Yul Beauchamps

            If you to browse all values from the data source you would see extremely high values for 'Time Delay'. These are the outliers I'm talking about.  I want to trim these out of my calculations.  Sure I could filter by an arbitrary number but I'd rather do it by a percentile.  i.e. keep only values within 90th percentile.  You'll have to excuse me as I'm still learning Tableau.  If I wanted to achieve this in SQL I'd do something like wrap "CUME_DIST() OVER (PARTITION BY MyGroup ORDER BY TimeDelay)" in a subquery and then use a where filter on anything below a certain percentage.


            I hope this makes more sense.

            • 3. Re: Filter by Rank or Percentile
              Joe Oppelt

              I think I understand.


              In the attached I made a FIXED LOD that gets set on every row, calculated at the level of each Group.  (I added it to the sheet after the [My Group] value.  Notice that it matches the value the sheet calculates as a measure.)


              To create that and to get the proper syntax in the calc, I just dragged the PCT75 pill from the measures shelf into my calc editor.  Tableau expanded the syntax for me.


              Now I have the PCT75 value on each row in the data.  Next I made [Filtered Time Delay].  Take a look at that.  it's a row-level calc, so no aggregations are needed.  For every row, propagate the [Time Delay] value into the new calc if it's <= the PCT75 value.  (Otherwise NULL by default.)


              I added SUM(Time Delay) and SUM(my new calc) to the sheet to show that rows get "filtered" out this way.  (They're still in there, of course, but they don't get included in the SUM because they're null.  But if you needed to, you could use the new measure to filter for non-null values.  And if you did that on this sheet, you would see both those last two measure would be equal because the null rows would be filtered out.)

              • 4. Re: Filter by Rank or Percentile
                Yul Beauchamps

                This looks like exactly what I need but there is one small problem.  When I tried to recreate this in my actual workbook I was given an error that stated 'PERCENTILE' is an unknown function.  After looking it up it appears it's because I'm dealing with Live Data and not an Extract.  This is so frustrating...I'll either have to make those calculations via SQL but that will make the report much less flexible or contend with extracting the massive table to Tableau Server.

                • 5. Re: Filter by Rank or Percentile
                  Joe Oppelt

                  I googled "Tableau percentile live connection".

                  There are threads about this.  Here's a good one:


                  Calculate percentile in an LOD without an extract


                  You're not dead in the water yet.  Don't give up!

                  • 6. Re: Filter by Rank or Percentile
                    Yul Beauchamps

                    I did do some extensive searching prior to even posting but admittedly I may not have been using the best search terms.  To get some traction I have decided to go ahead and use an extract but limit it to only a couple of years to minimize the load times.  That being said I will likely need to come back to a live data example for a different dashboard so I will definitely keep at it.  I appreciate the assistance.