9 Replies Latest reply on May 10, 2013 5:40 AM by Jonathan Drummey

    How do I set up "Top N" + "Other" processing that's based on a parameter?

    Mark Isaacs

      I want to set up a view that lists the top N diagnoses, along with a single row called "Other" for the remaining rows.  The value for N is specified via a parameter.  I've tried to set up a calculated field (called "Top N Diagnosis") based on the Diagnosis field, where the value is equal to Diagnosis if the rank is <=N and is equal to "Other" otherwise.  However, this doesn't work.   If I place both the "Diagnosis" and "Top N Diagnosis" fields on the Rows shelf I can see that the "Top N Diagnosis" is being calculated properly, but if I then remove the "Diagnosis" field from the Row shelf things fall apart.

       

      Can anyone clarify for me the best way to set this up?  Also, a few related questions:

       

      Can multiple calculated fields be displayed for each diagnosis - eg, total cases, total cases % of all cases, cumulative total cases, cumulative total cases % of all cases?

       

      Can this be changed from a "Top N" based on rank order to something based on a cumulative percentage cutoff (ie, a "Top N%")?  For example, list all of the diagnoses, sorted in descending cases volume sequence, stopping when the cumulative % reaches some value (eg, "80%", as specified with a parameter) and then including one additional row for "Other" for the remaining diagnoses?

       

      Also, rather than process a single Department at a time as I have it set up in the attached workbook, is it possible to add the Department to the Rows shelf and have the "Top N" or "Top N%" processing work as expected for each department?

       

      Any and all suggestions would be greatly appreciated!

        • 1. Re: How do I set up "Top N" + "Other" processing that's based on a parameter?
          Mark Holtz

          Mark,

           

          You were on the right track, but it starts to get hazy condensing the rows to "other" and getting them to show up how you want. Richard Leeke created a solution for a basic version of your question before (http://community.tableau.com/thread/109093). I basically started with what he made and took it a little further to get you your Running Sum and Running Pct.

           

          The only thing I didn't get was to actually filter by the Top %, but I don't think it's far off...

           

          I annotated the attached workbook. Hopefully that helps you in understanding how everything was done.

           

          Cheers

          • 2. Re: How do I set up "Top N" + "Other" processing that's based on a parameter?
            Mark Isaacs

            Mark,

             

            Many thanks for this.  I'm working through the various calculated fields you set up to see how they all fit together.

             

            I do have two questions so far:

             

            1. Where is the calculated field "Show Top N or 1st Row of Others" used?

            2. The calculated field "Show Filter" is used as a filter, with only the data rows where "Show Filter" is True being selected.   "Show Filter" is defined as "[In Top N] or [Others]", and "In Top 10" is defined as "[Rank] <= [Top N]" while "Others" is defined as "[Rank]=[Top N] + 1".  So, for example, if the parameter "Top N' is set to 20, it looks like "Show Filter" will be True if [Rank] is <= 21.  That seems to suggest that rows where the [Rank] is > 21 would then be filtered out, yet that's obviously not happening as they are all being included and consolidated into 'Other', within each Department.  I'm therefore confused by this calculated field "Show Filter", and its use as a filter.

             

            Thanks again!

            • 3. Re: How do I set up "Top N" + "Other" processing that's based on a parameter?
              Mark Isaacs

              Mark,

               

              It looks like there's a problem with the values appearing on the 'Other' rows.  If you  change the value for the Top N parameter the change in the value for the total cases on the 'Other' row doesn't equal the sum of the cases for the rows that are either newly-revealed or newly-hidden.  (It's easiest to see this if you edit the parameter and change the step size to just 1.)

               

              Also, I can't seem to format the displayed numeric columns to make them right-aligned.  Is there perhaps some trick needed to do that?

               

              Thanks again

              • 4. Re: How do I set up "Top N" + "Other" processing that's based on a parameter?
                Jim Wahl

                Mark and Mark,

                It looks like there's a problem with the values appearing on the 'Other' rows.  If you  change the value for the Top N parameter the change in the value for the total cases on the 'Other' row doesn't equal the sum of the cases for the rows that are either newly-revealed or newly-hidden.

                I'm not sure why, but I fixed this by replacing [Rank] with INDEX() in [Cases Sum] --- and after editing the formula, selected Compute Using > Advanced (Addressing: Department, Diagnosis), Sort (Field: Cases / Sum / Descending).

                 

                Since [Rank] = INDEX(), I don't understand what's happening when [Rank] is used (with the same compute using options).

                Also, I can't seem to format the displayed numeric columns to make them right-aligned.  Is there perhaps some trick needed to do that?

                Just right-click on the column header > Format. On the left-panel, click on the align icon > Sheet > Pane > Right (or Automatic, which will right-justify numbers).

                 

                A few other comments

                1. I'm not sure [Rank] is necessary the table calcs. I find INDEX() is somewhat more intuitive and avoids the problem above.
                2. I still use [Rank] to enforce the sort order. If you just rely on the [Diagnosis] sort order (sort by SUM(Cases)), then the order is determined by the global count and not by [Department].
                3. I'd Remove [Top N?]. Maybe this was just here for testing / demonstrating the method. I do like the "xxx others", but this logic could be moved into [Top N Diagnosis].
                4. I like how you just get the totals when N == 0. I might change the wording on [Top N Diagnosis] so that it says "all xxx diagnoses" when N == 0.
                5. On my big monitor, the view looks funny unless I change from "Fit Width" to "Normal" in the top menu bar.
                6. I'd also uncheck "Show Field Labels for Rows".
                7. The [Show Filter] is used to hide the null "others". You could also incorporate this logic into the [Sort by] field.
                8. Finally, if you want grand total to work, move [Diagnosis] from Detail to Rows and uncheck Show Header.

                 

                I'm sure there are other ways to do all of the above. And Mark Holtz's example might be easily fixed (I'd love to know why the sums aren't working), but I find these Top N examples a good review for me of table calcs, and I thought I'd post my steps.

                 

                Jim

                • 5. Re: How do I set up "Top N" + "Other" processing that's based on a parameter?
                  Mark Isaacs

                  Hi Jim,

                   

                  Many thanks for all of these comments.  I'll work through them and hopefully all will become clear.

                   

                  Re the formatting:  I had set up a new workbook for a clean start and tried following the steps described by Mark.  I ended up with a worksheet that was getting close to what I wanted (see the attached image), but when I tried what you suggested for the formatting nothing happened.  For some reason the numeric values are  stuck on left-justify.  Also, the first row has an ellipsis displayed to the right of each of the three columns' values.  I'm puzzled by this, but as I was able to format the columns properly in your workbook I'll start over using INDEX() instead of [Rank] and perhaps this will make the formatting problem 'go away' as well.

                   

                  Mark

                  • 6. Re: How do I set up "Top N" + "Other" processing that's based on a parameter?
                    Jim Wahl

                    For some reason the numeric values are  stuck on left-justify.  Also, the first row has an ellipsis displayed to the right of each of the three columns' values. 

                     

                    Both the ellipsis and justification issue might be caused by Tableau 8's new "stack marks" feature. From the top menu bar, select Analysis > Stack Marks > Off.

                     

                    If you used Tableau 7, you'll probably remember occasionally entering a table calc that resulted in overlapping text. In Tableau 8, these "marks" are now "stacked" instead of overlapped.

                     

                    In my version of your worksheet, the grand total calculation results in multiple marks---even if you only see one value, there are many null values that create white space and the ellipses.

                     

                    You can also prevent the multiple values by moving Diagnosis from the details shelf to the rows shelf, just before Top N Diagnosis. Then uncheck Show Headers on the Diagnosis pill. This will hide the values, but the grand total calculation will change as a result and should be 1) accurate and 2) not cause the multiple marks problem. I don't often use grand totals, and why this is happening is still a bit of a mystery, but I expect Jonathan Drummey's talk Thursday will shed some light on it. See TDT:  Grand Totals & Subtotals with Jo... | Tableau Support Community

                     

                    Jim

                    • 7. Re: How do I set up "Top N" + "Other" processing that's based on a parameter?
                      Jonathan Drummey

                      Hi all,

                       

                      After Jim's ping I got curious. I also work in a hospital so this data is right up my alley. I think I have explanations for everything here:

                       

                      - There are two uses for the Rank field. One is that if you put the field in the view and set the Compute Using, then every other calc dragged into the view that uses Rank will use those settings by default, which is helpful for a complicated view like this. The other is that theoretically it'll just be computed once by Tableau then re-used, rather than re-computed a bunch of times across the calcs.

                       

                      - In MarkH's original view, setting At the Level to Diagnosis has no effect on the view. When using an Advanced Compute Using, setting At the Level to "Deepest" or the lowest value used for Addressing aka the right-hand window (Diagnosis in this case) are the same thing. It's just a UI "feature" to deal with.

                       

                      - The problem with the Cases Sum calc not working for Other was that the sort for the nested Cases Sum calc was set to Automatic instead of Cases/Sum/Descending. Setting that works. (I duplicated the worksheet in my own view w/my own calcs until I found that one, it's a really good pointer to check and double-check *all* of your compute using settings).

                       

                      - The ellipses showing up for the first marks and the left-alignment is because either a) you're not filtering out Null values for the Other row, b) you have Grand Totals turned on, or both. The ellipses are to inform you that somewhere in the view Tableau is having to stack marks, the left alignment is Tableau's behavior when it is stacking marks. In Tableau v8, stacking marks is on by default for all mark types when there are no continuous (green) pills in the view, whereas in v7 it was only on for a few mark types (Bar and Area are the ones that come to mind, maybe Gantt). Because the Diagnosis dimension is on the Marks Card, there is a greater level of detail in the view. In the regular (I call them Detail) rows, the measures are each returning a value for every combination of Department and Diagnosis, even if that value is Null. You can see this in the "jtd version", there are 288 Marks drawn, if you drag the Cases for Display pill off the Filters Shelf there are now 3072 marks and the view shifts. Note that in the "jtd version" Grand Totals are off.

                       

                      If you have Grand Totals turned on, then those table calcs are returning lots of Null values into the totals and causing the ellipses and mark stacking behavior. A general solution to this is use Analysis->Stack Marks->Off. Unlike the table calc filter I used in the "jtd version", we can't use table calc filters to filter grand totals because the grand totals have a separate computation path. However, turning mark stacking off won't work in this case because the measure calculations are designed to return a minimum of two separate values (top n and other) per Department, so even if you did turn mark stacking off it would result in overprinting in the Grand Total row.

                       

                      - For this kind of view, using Tableau's Grand Totals for validation can be painful. I set up a GT Validation view that uses a context filter to filter on just General Medicine, then a Set Filter for the Top N diagnoses by SUM(Cases). The Context filter is applied first and ensures that the Top N computation only applies against General Medicine. Then we can use the Sum of Cases and the In/Out of the Set to have total to validate against.

                       

                      Jonathan

                       

                      [Edited to add note on why Grand Totals won't work for this view].

                      • 8. Re: How do I set up "Top N" + "Other" processing that's based on a parameter?
                        Jim Wahl

                        Jonathan,

                         

                        Thanks for the detailed explanations. Very helpful and appreciated.

                         

                        I have a follow-up question on grand totals, which I understand better after your presentation yesterday.

                         

                        Starting from your (Top-N jtd edit.twbx.zip) workbook: If I enable grand totals on the "jtd version" worksheet, I see the stacked marks, which result from grand total being calculated for each level of detail, in this case Diagnosis. This wasn't entirely clear until I clicked on the grand total row and saw that the number of marks = count of Diagnoses * number of columns.

                         

                        If I just move Diagnosis from the detail shelf to the Rows shelf, grand total is no longer being calculated at the Diagnosis level of detail and everything appears to work. Diagnosis is still in the view, and I can hide the column by unchecking "Show header" on the pill.

                         

                        Am I missing something? Is there a downside to having Diagnosis as a Row pill instead of a Detail pill?

                         

                        Jim

                        • 9. Re: How do I set up "Top N" + "Other" processing that's based on a parameter?
                          Jonathan Drummey

                          You're welcome!

                           

                          I'd seen your comment about being able to remove the extra values in the GT and had left that because running down issues across all the calcs had taken so much time, thanks for bringing it up. You're right, when Diagnosis is on Rows it's not part of the level of detail for the Cases calculations so they don't partition on Diagnosis and don't return multiple results. There aren't any impacts on performance or computation that I can think of. There might be something going on in a different view if the Diagnosis dimension was a date or bin and domain padding was involved, I haven't done any explorations of domain padding and grand totals.

                           

                          Jonathan