6 Replies Latest reply on Feb 20, 2014 1:09 PM by Nilay Shah

    Comparing Top 20% to Bottom 20%

    Jason Halmagy

      I am trying to segment a dynamic population of employees by a sales measure into the top 20% of performers and bottom 20%. I would then like to use this segmentation to analyze a secondary measures improvement over time.


      I assume there is some usage of the index function in the solution. But to this point i have not been able to figure it out.


      Thanks for any help in advance



      Below is a sample of what I am looking for



        • 1. Re: Comparing Top 20% to Bottom 20%
          Jason Halmagy

          Anyone have anything????

          • 2. Re: Comparing Top 20% to Bottom 20%
            Jonathan Drummey

            Hi Jason,


            I took at look at this the other day and only got part way. It's possible to create a set of table calculations that provide both the top and bottom 20%, the issue I ran into is that how the results of those table calculations are used to generate the next table calculation that gets used in the graph. Tableau will let you partition table calcs on the results of other calculations, but not other table calculations, so I was working with a really complicated nested table calc and hadn't worked out the Compute using.  I've still got some other ideas about how to make this work (for example using the top/bottom functions in SQL with a union query to create a new datasource), so I can keep plugging away at it.


            I do have a couple of questions about time periods that could make this easier or harder:


            - For calculating the top and bottom 20%, are you looking at generating that over the entire time window, or one period at a time? For example, John Smith could be in the top 20% one quarter and in the bottom 20% the next, but maybe you're just interested in what bucket John Smith falls into for the entire year.


            - Does the period being looked at for the final measure match that of the period being used for calculating who is in the top and bottom 20%?



            • 3. Re: Comparing Top 20% to Bottom 20%
              Jason Halmagy

              Thanks for taking some time with this. To answer your questions:


              1. The segmentation would ideally be based on the entire window, not the individual time buckets


              2. Yes the periods match for the two measures.


              Let me know if you need any other clarifications

              • 4. Re: Comparing Top 20% to Bottom 20%
                Jonathan Drummey

                The ultimate goal is to segment a population into a top 20% and bottom 20% based on one measure, and then plot each segment over time against a second measure. The time scale used for all calculations is the same. For this example, we're segmenting Customers in the Superstore Sales sample into top and bottom 20% of sales across all Years, and then using those two segments to plot against % of total profit and show that in a line chart per Year.


                Many thanks to Joe Mako for his help on this, I was stuck with the last bit of partitioning until he explained how to organize the calcs in a much easier (and ultimately workable) way.


                Here's the process:


                - Identify the percentile the customer falls into by sorting the customers by descending sum of sales and then using the Index() function
                - Segment the customers into top or bottom 20% using an IF statement
                - Calculate the total profit across all customers using TOTAL(SUM([Profit]))
                - For the top 20% of customers, figure out their % of total profit, which gets complicated because we have to use an IF statement to effectively partition the data since Tableau does not support partitioning table calculations based on other table calculations
                - For the bottom 20% of customers, figure out their % of total profit, which turns out to be negative in some cases
                - Filter the results using a table calculation to only show the final aggregate results for one customer, thereby hiding other results
                - Plot the top and bottom measures
                - Hide that one customer's details


                The Crosstab worksheet in the attached workbook shows the calculations in process, the Final worksheet shows the results.


                Here are the calculations:


                0. Year
                YEAR([Order Date])

                Rather than just drag the Order Date onto the Columns shelf, we use a calculated field for the Year because we're going to be doing a bunch of table calculations and Tableau can start padding data in interesting ways when using dates. If you needed to do the calculation by quarter or month, then you could create a date string that would work.


                After you create the 0. Year calc, drag it to the Dimensions window. Then put 0. Year on the Columns shelf. Put Customer on the Rows shelf, and set the sort for Customer to Descending/Sales/Sum.


                1. Index
                Set the Compute Using to Table (Down). Since that the view is sorted based on Descending Sum of Sales, this will go from 1 to 795.


                2. Max Index
                WINDOW_MAX([1. Index])
                By default, the Compute Using will be set to Table (Down) if the 1. Index is already in the view and set to Table (Down). This will return 795 for all customers.


                NOTE: When a pill with nested table calcs is dragged into a view, Tableau tries to have the pill use the Compute Using settings of the sub-table calcs that are already in use, but sometimes it gets confused and forgets. You'll need to verify the Compute Using by hovering over the pill in the view and/or using the Edit Table Calculation dialog.


                3. Percentile
                [1. Index]/[2. Max Index]
                Now we're getting the bracket the customer will fall into. The Compute Using will be set to Table (Down).


                4. Top or Bottom
                IF [3. Percentile] <= .2 THEN 1
                ELSEIF [3. Percentile] >= .8 THEN -1 END

                Here we are segmenting the customers based on their percentile. Again, the Compute Using is set to Table (Down).


                NOTE: As of Tableau 7.0, this kind of evaluation HAS to result in a number, not a string. It's a strange behavior, the "Can't evaluate Top/Bottom Calc as String" worksheet shows how the evaluation fails when trying to have the IF statement result in a string


                5. Total Profit
                The Compute Using is set to Table (Down).


                6a & b: Top & Bottom
                WINDOW_SUM(IF [4. Top or Bottom] = 1 THEN SUM([Profit]) END, 0, IIF(FIRST()==0,LAST(),0)) / [5. Total Profit]

                WINDOW_SUM(IF [4. Top or Bottom] = -1 THEN SUM([Profit]) END, 0, IIF(FIRST()==0,LAST(),0)) / [5. Total Profit]


                Again, the Compute Using set to Table (Down).

                The denominator for this calculation is the Total Profit across all customers. The numerators for each measure are more complicated because we want to only sum up the Profit for customers in the top 20% or bottom 20%. To improve performance, we only do the whole calculation in the first row in each partition, using a technique described by Richard Leeke here: http://www.clearlyandsimply.com/clearly_and_simply/2011/01/another-look-at-site-catchment-analysis-with-tableau-6-part-3.html. You can see this in the cross tab view, where the results for these two calculations change for each Customer.

                The above shows how each part of the calculation is worked out. However, since all the sub-calculations have their Compute Using set to Table (Down), we can combine them all into one calculation each for Top and Bottom.


                7 a & b: Top & Bottom 20% of Customers
                Top 20% of Customers:
                    IF (IF INDEX()/WINDOW_MAX(INDEX()) <= .2 THEN 1 ELSEIF INDEX()/WINDOW_MAX(INDEX()) >= .8 THEN -1 END) = 1 THEN
                        SUM([Profit]) END, 0, IIF(FIRST()==0,LAST(),0))
                / TOTAL(SUM([Profit]))

                Bottom 20% of Customers:
                    IF (IF INDEX()/WINDOW_MAX(INDEX()) <= .2 THEN 1 ELSEIF INDEX()/WINDOW_MAX(INDEX()) >= .8 THEN -1 END) = -1 THEN
                        SUM([Profit]) END, 0, IIF(FIRST()==0,LAST(),0))
                / TOTAL(SUM([Profit]))


                This will have the Compute Using set to Table (Down). Now we can work on the Display view.


                8. Is First

                This calculation tests whether the given row is the first row in the partition, and if so then returns True. The Compute Using will be set to Customer and we'll use this to filter the results.

                Creating the Display view:
                1. Duplicate the Crosstab view.
                2. Drag Is First onto the Filters shelf, and filter only for True values. Set the Compute Using for Is First to Customer. This ensures that we'll only be displaying the results for 1 Customer.
                3. Delete all the Measure Values except for 7a. Top 20% of Customers and 7b. Bottom 20% of Customers.
                4, Drag Measure Values onto the Rows shelf.
                5. Drag Measure Names onto the Color shelf.
                6. Click on the Customer pill on the Rows shelf and de-select Show Header.
                7. Format the Value axis to set the Number style to %, and Edit the axis to set the text.
                8. Edit the Tooltip to remove Customer, since the view is showing results for all customers.

                Alternative calculation to segment based on per-year performance:


                You might want to do a segmentation of the top/bottom performers based on each year's performance, instead of across all years. That requires a different calculation to identify those performers/customers, since a customer could be in the top one year, bottom the next, and middle of the pack in the third year. Or they might not have any results at all in a given year. For example, in the Superstore Sales data with Year on Columns and Customer on Rows with nothing else in the view, there are blank cells (no "Abc") that indicate there is no data. In that case, we probably want to remove those Customers from being segmented, otherwise they'd fill up the bottom performers list.


                Here's the alternative Index calculation to have a different segmentation per year, let's call it "Index per Year":

                IF NOT ISNULL(LOOKUP(SUM([Sales]),0)) THEN INDEX() END

                The function will look at the sales for each customer/year combination, and if that result isn't Null then we'll use that in the index.


                Then, the 4 instances of INDEX() in each Top/Bottom 20% of Customers calculation would be replaced by [Index per Year]. The Compute Using for Index would be set to Customer and Year, with Order Along set to Sales - Sum - Descending, and Restarting every Year. The Compute Using for the Top/Bottom 20% of Customers would be set to Customer. That way the index would be appropriately calculated within each Year and the calculations made within each year.


                The per Year Crosstab and per Year Display worksheets show these alternative calculations.


                Let me know if this works for you!



                1 of 1 people found this helpful
                • 5. Re: Comparing Top 20% to Bottom 20%
                  Jason Halmagy

                  Yes this response seems to be doing the trick. I have some other things that i am working through with the whole top 20% v bottom 20% that may create some more questions. Glad to know someone can figure it all out.

                  • 6. Re: Comparing Top 20% to Bottom 20%
                    Nilay Shah

                    this is a great workbook with lots of info and new stuff. thank you Jonathan.. question.. i could be using this for muplitple months and index function doesnt work .. it keeps 1, 2,3 ,4 ratings for time period that has been picked before..


                    for example

                    i have

                    month      Jan-2013          Feb 2013

                    Store#     1                         1    

                    Sales     10K                         5K

                    Index     1                              1 (problem is index = 1 even though this is not the best store, it changed next month)


                    how do we do that??

                    thanks in Advance


                    1 of 1 people found this helpful