6 Replies Latest reply on Jun 15, 2012 5:58 AM by Jason Scarlett

    Histogram troubles (level of detail vs aggregation)

    Jason Scarlett

      I've been struggling to get a histogram that shows an aggregate count and a cumulative percentage to work. I've been through these posts (http://community.tableau.com/thread/111907, http://community.tableau.com/thread/111936, http://community.tableau.com/thread/111837)) but nothing is quite the same as what I am looking at here.

       

      Background: We collect hourly census data from several hospitals (INST). Each row represents the average number of patients on a given patient care unit (PCU) during that hour.

       

      Question: How can I plot both a frequency distribution of the # of patients and a cumulative percentage (line) of patients in exactly 20 bins (bin sizes adjust according to min/max census counts).

       

      Attached Workbook: Attached is a start of what I have. It seems I have a conflict between (1) the aggregation used to create the bins, (2) the level of detail using the exact date, and (3) a running sum attempting to ignore the level of detail.

       

      Thanks

      Jason

       

      v7.05 desktop and server

        • 1. Re: Histogram troubles (level of detail vs aggregation)
          Jonathan Drummey

          Hi Jason,

           

          I was looking at some of those same forum posts in the last few days as I was trying to prepare a similar kind of view of hospital readmissions data. After trying to get the table calculations to work properly, and thinking I figured out why they won't work, the easier thing for me ended up being exporting the data from the worksheet and using my equivalent of your Census bin (Y) and and sum(avg count) (X) fields to then generate the Pareto chart. I wanted to do a writeup of this, and solicit more input from other Tableau experts, and since you'd already posted I decided to use your data instead. Here's a pic of where I got:

           

          screenshot1.jpg

          The goal is to have a percent of total running sum over a set of data where the numbers being summed are a table calculation [Census bin] and the dimension is a calculated field [sum(avg count)].

           

          Complicating factors are two dimensions that are required to be in the overall level of detail for the view: [Inst #] (which has only one value), and [Date].

           

          Normally, to generate the percent of total running sum, we'd set the addressing aka Compute Using to [sum(avg count)] and be all set. However, [sum(avg count)] is a calculated field. We can use calculated fields like [sum(avg count)] for partitioning table calculations (by unchecking the "Ignore in Table Calculations" option on green pill, they can't be used for addressing. The only ways to get those fields to be used for addressing is to use Tableau's "visual addressing" options of Table (Across),Table (Down). These options are totally dependent on the set up of the pills in the view.

          Where this breaks is in cases where we don't want all the fields on the Rows or Columns shelves, for example in this case we ultimately want Date in the Level of Detail. When we move Date to the Level of Detail, the Table (Across), etc. fail to return the desired results and there is no way that I know of to get the addressing to properly work. Hopefully Richard Leeke or Joe Mako can jump in with some ideas.

           

          That said, I got most of the way there, here's what I did. There are almost certainly some optimizations that can occur in here, I haven't tried to do more in order in order to show how things are working so far.

           

          1. Created a Census bin for RS field with the formula:

          WINDOW_COUNT([sum(avg count)])

          This is to ensure we have every row available to other calculations, whereas the Census bin field implements the IF FIRST()==0 technique to reduce results. The Compute Using is set to Date.

           

          2. Created the RS Census bin field with the formula:

          PREVIOUS_VALUE(LOOKUP([Census bin for RS],0))

          +LOOKUP(IF LOOKUP([Census bin for RS],0) != LOOKUP([Census bin for RS],-1) THEN

          LOOKUP([Census bin for RS],0)

          ELSE

          0

          END,0)

          This uses the PREVIOUS_VALUE table calc to iterate over the Census bin for RS field, the IF LOOKUP() part is used to detect when there's a new value for Census bin for RS and then adds that to the result. This uses a nested Compute Using where RS Census bin is Table(Down) and Census bin for RS is set to Date.

           

          3. Created the Percent of Total Census Bin field with the formula:

          [RS Census bin]/WINDOW_SUM([Census bin])

          With the Compute Using set properly, this generates the % of total. The nested Compute Using is:

          Percent of Total Census Bin: Table(Down)

          Census bin: Date

          RS Census bin: Table(Down)

          Census bin for RS: Date

           

          We can see all that in action for the "workout crosstab" worksheet and delivering expected results. See the "trial view" worksheet for the chart.

           

          Normally, what I do prior to moving from the crosstab view to a chart is to explicitly set the addressing and partitioning of the table calculations so they don't change when I drag pills around in the view. However, as noted, when we do this here they break. For example, if you drag the blue Date pill from Columns to the Level of Detail Shelf, Percent of Total calc returns 100% for all Columns.

           

          Because Table(Down)/Table(Across) is what we need to get the Percent of Total calcs to work in this case, I did some hack-y things to the worksheet:

           

          - Used the Swap Columns/Rows to move all the discrete and continuous values at once.Tableau automatically shifts the Table(Down) used in the workout crosstab worksheet to Table(Across).

          - Created an INDEX() calculated field called Index, put that on the Filter Shelf, and set the Compute Using to Pane(Across), then selected only for 1 in the Filter. This takes all the extra rows from Date that are needed to generate totals down to 1 row while preserving other calcs, since filters on table calcs are applied after most all other calculations are complete.

          - Unchecked "Show Header" for both the [Date] and [Inst #] fields on the Columns Shelf.

          - Rotated the text label for [sum(avg count)]. I don't know how to push that heading back to the bottom of the view other than to move [Date] and [Inst #] off of Columns, however in the case of [Date] that breaks the Percent of Total calculation.

          - Set up a dual axis and changed the Mark Type for the Percent of Total Census Bin to Line. It doesn't actually show a line because the Path would need to be defined, that would take some extra jumping through hoops to get to, and even then I'm not totally sure it's possible given the discrete values on Columns.

           

          If I were going to use this particular non-ideal view in practice, I'd probably set up a dashboard with this view and a similar view that didn't actually have data shown, but just the [sum(avg count)] header at the bottom of this. At that point, then we could remove the gridlines and have a more legible view.

           

          Cheers,

           

          Jonathan

          1 of 1 people found this helpful
          • 2. Re: Histogram troubles (level of detail vs aggregation)
            Jason Scarlett

            Great stuff.

             

            A quick comment:

            1. I changed the [RS Census bin] calculation to be dependent upon a change in the [sum(avg count)] instead of the [Census bin for RS] which may occasionally be identical from one pane to the next.

             

            PREVIOUS_VALUE(LOOKUP([Census bin for RS],0))

                +LOOKUP(IF LOOKUP([sum(avg count)],0) != LOOKUP([sum(avg count)],-1) THEN

                    LOOKUP([Census bin for RS],0)

                ELSE

                    0

                END,0)

             

            More comments and feedback to come in the next few days (still trying to get there to be exactly 20 bins).

             

            Jason

            • 3. Re: Histogram troubles (level of detail vs aggregation)
              Richard Leeke

              I had a bit of a look at this last night, reached exactly the same point about table calcs only being available for partitioning and not addressing, vaguely considered the possibility of using PREVIOUS_VALUE() to roll-my-own running sum in the way that you have here but gave up at that point.

               

              So sorry, nothing to add, Jonathan - you've definitely pulled well ahead of me on your table calc understanding, I think. You'll be snapping at Joe's heels next. ;-)

              • 4. Re: Histogram troubles (level of detail vs aggregation)
                Joe Mako

                You can use use a few table calcs to accomplish what you are looking for, as in the attached.

                Bin Size (for calculating the size of each bin):
                (WINDOW_MAX(SUM([AVG_COUNT]))-WINDOW_MIN(SUM([AVG_COUNT])))/20

                Avg Count (bin) (the discreet pill that will be creating the Cells):
                (INT((SUM([AVG_COUNT])-WINDOW_MIN(SUM([AVG_COUNT])))/[Bin Size]) -IIF(SUM([AVG_COUNT])=WINDOW_MAX(SUM([AVG_COUNT])),1,0)) *[Bin Size]+WINDOW_MIN(SUM([AVG_COUNT]))

                Census bin (the value for the Bar marks):
                IF LOOKUP([Avg Count (bin)],-1)=[Avg Count (bin)] THEN  PREVIOUS_VALUE(0)+1 ELSE 1 END

                Running Percent of Total (the value for the Line mark):
                RUNNING_SUM(SUM([AVG_COUNT]))/TOTAL(SUM([AVG_COUNT]))

                Last Mark in bin (the filter to show only the last mark in each Cell):
                ZN(LOOKUP([Avg Count (bin)],1))<>[Avg Count (bin)]

                A discrete exact date pill for Date is on the Level of Detail shelf and sorted ascending on the sum of AVG_COUNT, this pill is used as the Compute using for all table calc pills.  Depending on your actual situation with multiple "Inst #" values, the compute using settings or formulas may need to be adjusted to fit your business logic.

                There are likely other routes to achieve this same result that may be a better fit depending on your constraints (I also suspect there is a more optimal way to write some of these formulas).

                • 5. Re: Histogram troubles (level of detail vs aggregation)
                  Jonathan Drummey

                  @Jason: Sorry about that, I had a bit of confusion at one point and missed that sum(avg count) when I was cleaning it up.

                   

                  @Richard: Thanks for the compliment! Thought I still have a bunch to learn, Joe's post expanded my mind (again). I'm attending his training in a couple of weeks and very much looking forward to it!

                   

                  @Joe: Awesome work, the idea of building the whole view (columns *and* rows) out of table calcs hadn't occurred to me, I'd always been thinking about partitioning and getting stuck there. I'm going to have to play with it some to take it in. Thanks for posting!

                   

                  Jonathan

                  • 6. Re: Histogram troubles (level of detail vs aggregation)
                    Jason Scarlett

                    This looks great.

                    I'll report back on anything I find when I apply this method to 100+ sites and 100 million records.

                    Jason