3 Replies Latest reply on Apr 3, 2012 10:56 AM by Jonathan Drummey

    Organizing many to one data for optimal tableau vizualization

    David  Klass

      I am trying to organize my data using MSSQL for delivery to tableau and have lots of flexability but would like to deliver to tableau something that is reasonably easy to handle.  The data consists so far of about 15,000 rows : multiple events happening to  the same individual each individual having unique measures such as age, total time spent, admissions and discharges etc.. The events are specific actions that have occurred on these individuals and  have their own groupings.  I would like to analyse the events and their groupings with respect to the aggregate measures that are derived from the individuals measures. (average age, average length of stay etc.)  If I bring in a query which puts the eventID, event groupID  with the individual's ID there will be many duplicate entries in the fields that have the individual measures and my averages etc will be distorted unless there is a way in Tableau to have it calculate say average age etc. taking into consideration the duplicates that exist. I could bring in 2 separate queries joined in Tableau on individual ID (5000 of them) and have some data blending but what field to share since each individual can not only have multiple events but have those events belong to multiple groups.  I would like to end up with the ability to see if  groups of individuals grouped by events and groups of events have different aggregate measures.

       

      I am hoping to get some clues as to the best way to begin to organize my data to optimize my tableau experience. Any thoughts would be appreciated.  Thanks.  David

        • 1. Re: Organizing many to one data for optimal tableau vizualization
          Jonathan Drummey

          Hi David,

           

          Since you mentioned admissions and discharges and your title is Med Director, I'm guessing you do some sort of hospital work? I ask because I also do data analysis for a system with a hospital and several outpatient clinics, and I'm always glad to meet more people doing this work in Tableau!

           

          There are a number of ways to accomplish this, between joining data in your data source as queries or views prior to Tableau, using Tableau's multiple tables or Custom SQL, or using Tableau data blending. As a general rule of thumb, the more of the work you can do closer to your datasource, the better for performance.

           

          When I'm working with visits I do some preprocessing of the data to get it ready for Tableau. For example, when I'm doing analysis to follow patients having certain characteristics over time (like everyone with a COPD diagnosis) I'll create additional table(s) in the datasource to hold that information, so for example I don't have look at all the diagnoses over all time every time I'm working with the data. The same goes for some of the aggregates, if you know the dimensions you're working with - i.e. it's a monthly aggregation, etc. then you can do some of those in advance in the datasource as well.

           

          I'll end up with an query that returns a single dataset for the kind of analysis I'm doing, which will a some amount of duplicated dimensional information. Once I have that query working, I'll often use an extract to maximize Tableau's performance.

           

          In some cases, this is all you'll need to use Tableau's aggregate functions. In other cases, where you have to do the aggregations and then further aggregations, which require table calculations, and beyond that table calculations can be nested for further fun.

           

          For example, let's say you haven't precomputed an average LOS. For a LOS for visits comparing readmissions to non-readmissions, you could have the "Is Readmit" dimension on the Columns shelf and the measure on the Text shelf, with the Individual ID on the Level of Detail shelf. have your Individual ID on the Level of Detail. This would cause an AVG(LOS) to return overlapping text, like so (Master MRN is the equivalent of Individual ID in the screenshot, I'm working off of live data but have filtered some out to hide details):

           

          screenshot1.jpg

           

          Now we create a table calculation that is WINDOW_AVG(AVG([LOS])) and set the Compute Using to be the Master MRN/Individual ID:

          screenshot2.jpg

           

          Note the "bold" text. This means Tableau is still returning lots and lots of overlapping results that all are the same result. So we need to change what Tableau is doing to return one result and only one result, by editing the second calculation to be the following:

           

          IF FIRST()==0 THEN

              WINDOW_AVG(AVG([LOS]),0,IIF(FIRST()==0,LAST(),0))

          END

           

          (Details of how and why that code works are in the Site Catchment Analysis link below).

           

          Once you've done this much, then you can start dragging and dropping other dimensions into the view and Tableau will appropriately re-calculate. You're going to need to be familiar with table calculations in Tableau. If you haven't already, I suggest reading the materials at:

           

          http://www.tableausoftware.com/table-calculations

          http://www.tableausoftware.com/support/knowledge-base/table-calculations

           

          The training videos are also helpful. Also, the following blog and forum posts by Tom Brown, Joe Mako, and Richard Leeke are really useful:

           

          http://theinformationlab.co.uk/blog/2011/9/23/blue-things-and-green-things.html

          http://community.tableau.com/message/138424#138424

          http://community.tableau.com/message/175545

          http://www.clearlyandsimply.com/clearly_and_simply/2011/01/another-look-at-site-catchment-analysis-with-tableau-6-part-3.html

           

          You'll ultimately want to have a good understanding of partitioning and addressing, and how to optimize table calculations for performance (even on a 15,000 row dataset).

           

          Cheers,

           

          Jonathan

          • 2. Re: Organizing many to one data for optimal tableau vizualization
            David  Klass

            Jonathan:  Thanks for taking the time for such a comprehensive response.  It will take me a while to assimilate this.  The particular issue that i was struggling with I seem to have answered to some degree and that was how to deal with events that we follow that can occur many times uniquely and duplicated  to individuals during their stay. I have decided to aggregate these up front in the database by marking for each of these the first time they occur to an individual and then counting their incidence for that person over time.  With this and then aggregating some temporal  data relating the events to each other I have made a good start.

             

            I really appreciate the LOS details that you shared. With the movement data I have had less of a problem but you have certainly given me some more clues as to how to enrich what I already have. 

             

            Thanks again.  David