3 Replies Latest reply on Oct 23, 2012 7:36 AM by .George Prevelige

    The age old Percentile question

    .George Prevelige

      There are more than a few threads that address the concept of percentiles, deciles, quartiles, etc. But they answer specific, complex questions posed by developers trying to solve their particular problem. I'm having a difficult time translating these semi-esoteric solutions to my situation.


      So, I'm posting an example problem that I hope is generic and straightforward enough so that - when/if someone posts a solution to it - we will have a basic template I and others can refer to when trying to produce these results.


      The data is basically a list of scores. I want Tableau to give me a count of records in each percentile - e.g., # records in Top 5%; # of records in Bottom 10%.I did add two categories to the mix that need to be totaled separately. I included a mockup results view as a reference.


      Much thanks in advance to anyone kind enough to post the solution. I'm sure it will be very valuable to a lot of users.


        • 1. Re: The age old Percentile question
          Jonathan Drummey

          Hi George,


          Since Tableau doesn't have a built-in PERCENTILE function, we have to use workarounds. Each workaround has different tradeoffs which may be unacceptable depending on the problem at hand.


          For example, a simple blocking into four quantiles for a heat map/highlight table can be done using and INDEX() function with a stepped color, like this:




          Or you can view a spread with circle, shape, or square marks and reference lines for quartiles or other 'tiles:




          However, what you are looking to do is generate a percentile result for each row, which currently requires a table calculation. Table calculations are wonderfully powerful, and have some limitations in terms of what kinds of aggregations are easy to set up and which ones take more effort. Partitioning results by a table calculation like a percentile calc to get the total records in each of the top/bottom 5 percentiles falls into the more effort category, but it's do-able. A prerequisite is being familiar with table calculations, particularly setting Advanced and nested Compute Using's.


          Richard Leeke has done a huge amount of work on quantiles, I took a formula for calculating percentiles from this post: http://community.tableau.com/message/189200#189200. Here's the formula:


          // The quantile rank gives the proportion of values the same as or less than the value in question
          // Note that the term subtracting half the number of records for the current value is effectively
          // returning the average rank in the case of multiple occurences of the same value.
          // See http://http://en.wikipedia.org/wiki/Percentile_rank.
          // Calc taken from http://community.tableau.com/message/189200#189200
          IF (ATTR([Score]) = LOOKUP(ATTR([Score]), -1)) THEN     PREVIOUS_VALUE(0) ELSE     INT(100 * (RUNNING_SUM(SUM([Number of Records])) - ((SUM([Number of Records]) - 1)/ 2)) / TOTAL(SUM([Number of Records]))) END


          The calculation needs an Advanced Compute Using set to the ID, sorted on SUM([Score]) Ascending. This gives you a result like this:




          Then from that, we need to create set of measures that will return the number of records falling into each quantile range. Since we can't partition by table calcs, we need to generate our own total. Here's the Top 5th %ile Feeder calc:


          IF [Percentile] >= 96 THEN


          This calculation will have it's Compute Using set to ID, while the nested Compute Using for Percentile remains the same as above. This calc essentially does a running sum to find the number of records (ID rows):




          Now, we only want the total of the top 5 percentiles, so the next step is to get the largest values of the feeder calc. And, since we want this ultimately to display in a text table, a surrounding calc is needed to prevent overlapping text. Here's the formula for the Top 5th %ile:


          IF FIRST()==0 THEN WINDOW_MAX( [Top 5th %ile Feeder] ) END


          This calc only performs the evaluation for the first row in the partition. The Compute Using for this calc is also set to ID, with the nested Compute Usings for Top 5th %ile Feeder and Percentile remaining as above:




          The Top 5th calcs can be duplicated and edited to create the additional calcs, so the workout worksheet looks like this:




          Then, duplicate the worksheet, move ID from the Rows Shelf down to the level of detail, swap Type & Measure Names on Columns & Rows, and you wind up with this:




          Hopefully that meets your needs! The workbook is attached.




          Note that as far as I can tell, these totals for Old are correct, I'm not sure what you were using in your Hoped For Result worksheet.

          • 2. Re: The age old Percentile question
            Jonathan Drummey

            Just to add one more thing, I dove into this problem without really thinking about the desired result, which is a count of the number of results (rows) in each percentile bracket. Creating some measures like for both the Top and Bottom 5th Percentile = SUM([Number of Records]/20, Top 10th Percentile = SUM([Number of Records])/10, will return results that are mostly accurate, with some rounding issues.


            Thanks to Richard Leeke for pointing this out to me!

            • 3. Re: The age old Percentile question
              .George Prevelige

              Holy smokes, thanks so much!

              I was hoping for a hint or perhaps a table calc snippet, but you provided a complete and elegant solution.

              I really own one.


              Thanks again,