10 Replies Latest reply on Mar 27, 2014 3:20 AM by Ville Tyrväinen

    Third data source and many to many result

    Claire Douglas

      I am new to Tableau, and am currently getting into tricky questions and not being able to fight my way out.  Right now I'm working with three data sources, and I seem to be hitting some many to many join issues.

       

      The data sources, to be blended:

      Visits (one record per visit, includes Date, Location, Health Care Provider ID, and Visit ID.)

      Hours (one record per Health Care Provider, Date, and Location.  Includes Hours worked.)

      RVU (essentially the complexity.  Currently one record per visit.  Includes Date, Location, Health Care Provider ID, and Visit ID.  Includes the total RVUs.)

       

      The goal is to create visualizations that show Visits per Hour and RVUs per Hour on the same sheet.

       

      I would like to blend all three of the data sources together, with Visits as the primary data source.  I am able to see both Visit and Hours on one sheet with Visits as the primary data source, and hours joined on Date, Health Care Provider ID, and Location.  I am able to see both Visits and RVUs on one sheet with Visits as the primary data source, and RVUs joined on Date, Health Care Provider ID, and Location.  However, when I try to add the third data source to one of the working sheets, it looks like I'm getting a many to many join.  Adding Hours to the Visit/RVU table keeps visits and hours accurate but increases RVUs about ten times.  Adding RVU to the Visits/Hours sheet also increases the RVU values by about ten times.  If I add in a join between Visits and RVUs on Visit ID, the visits and RVUs are accurate, but then Hours increases about ten times.

       

      What am I missing here?  Are you only able to blend two data sources on one sheet? (I'm desperately hoping this isn't true.)

       

      Thanks,

      Claire

       


        • 1. Re: Third data source and many to many result
          Noah Salvaterra

          You can have multiple secondary sources, though blending these is likely to get tricky. Are these in a common database? If so, you're better off using either custom SQL or a Multi-table connection. That way you will be able to setup how you want these sources to connect without any concern about the details of data blending.

           

          If this MUST be done with blending, it would be useful to have some datasets to work with. While I expect this isn't something you can share without anonomizing, you could create something in excel that would have the same sort of structure so potential helpers would have something to use for demonstration.

          1 of 1 people found this helpful
          • 2. Re: Third data source and many to many result
            Claire Douglas

            Thanks Noah.  It is possible to join this data by changing stored procedures, which I can do for this workbook, but in these early stages of implementation I'm also trying to figure out what my limits are.    As a rule, should we avoid blending more than two data sources?  Is there some documentation that can help describe what actually happens when a third data source is blended?  In another workbook I was trying to put together four data sources- budget and budget total (for percentage purposes with filtering), and actual with actual total (again for percentage purposes with filtering).  In that project I was also fighting against what looks like some many to many issues (and took a break with it because I couldn't figure out what was going on).  Are we asking too much of Tableau?  I'm hoping that we don't need to do massive SQL coding to get what we want for complicated comparisons.  We have the skill set, but that takes so much more time.

            • 3. Re: Third data source and many to many result
              Noah Salvaterra

              I'm not talking about massive SQL coding or database changes, you can do joins from Tableau. If you right click on the connection and select "Edit Connection" it will take you to this screen:

              Edit_Connection.png

              You may find Multiple Tables does the trick (even if it doesn't this can be useful way to start building your query). I clicked on Custom SQL in the image to show how my connection to superstore sales shows up as a SQL Query. You can make changes to the query in this box if you want joins, aggregation, filtering etc.

               

              My policy on joins isn't so much about Tableau as it is what I want control over within Tableau and what would be more likely to cause me problems. I work in car insurance, where for an individual policy there may have multiple vehicles, drivers, violations, accidents, contacts, visits, impressions, devices, browsers,... the list goes on, but the point is that I deal with a lot of one to many and many to many relationships. Blending is a powerful tool, I find it particularly convenient way to bring in data that doesn't coexist with my primary sources, otherwise it could be a project to move it. Tableau connects with native drivers, so it will use a SQL driver to connect to a SQL database, but that driver doesn't know how to read an excel file much less do a join with one. Blending bridges that gap.

               

              When it comes to analysis on data that is all in one place I prefer joining when possible for a few reasons:

              1. It happens on the database, rather than on your desktop, so it may be faster.
              2. Blending is a specific case of joining, so joining (when it is possible) can be more flexible.
              3. Once you setup the connection it is fixed at the connection level (unless you change it), whereas blending is specific to a worksheet. With blending, differences in granularity will always be in your face. If you remove pills from the view you may get different numbers, or no numbers at all.

               

              Many things in Tableau are simple, which is great! It is very empowering to be able to drag, drop and answer a lot of questions quickly. But some questions either by their nature or as a result of data structure that are still hard, or at least they require some thoughtful setup to get the answer you seek. It seems like you could have such questions. Fear not though, there are plenty of SQL rockstars on the forum who will jump in to help. Most will probably wait for some sample data though.

               

              In terms of your specific question about blending a third data source, with regard to blending, from Tableau's perspective there are two kinds of data sources, primary and secondary, the secondary sources don't see each other directly. Each secondary sources is blended to a single common primary (as a post aggregation left join), so if one of your data sources needs to be connected via the other then blending might not be an option at all. I'm not the best resource on documentation, but I will invite Jonathan Drummey to join in (pun fully intended) he may be able to point you to in the right direction there. Jonathan also knows a lot more about blending then I do, so hopefully he can correct anything glaring that I've gotten wrong or left out.

              1 of 1 people found this helpful
              • 4. Re: Third data source and many to many result
                Jonathan Drummey

                The question is, will I blend in this conversation? (ok, enough bad puns).


                I usually try to get an understanding of the data and the goal, and to figure out how to work from there. Claire, your data description is good (and I work in Quality in a health system so I have a bit of familiarity with how your data could be set up), though as Noah noted it would be even more helpful to have some mock-up data to have an understanding of the exact grain of the data and the views you've been building.


                Now, you'd written "The goal is to create visualizations that show Visits per Hour and RVUs per Hour on the same sheet." One way to do this is to have two worksheets on a dashboard, each coming from different data sources. This can have complications if you want to do filtering across both worksheets, but it's one possibility and could be the easiest to set up.


                Beyond that, I can't give you an exact answer, because I'm not clear how your existing workbook is set up. You wrote about blends and joins, but I'm not sure from reading what you've written whether you're using Tableau Data Blends or SQL joins in your data. You also haven't specified what dimension(s) you are blending or joining on, and the aggregations you are using.


                Here's the way I keep it straight for myself:


                - join - standard SQL terminology, you can point and click to set up left, inner, and right joins in the Tableau connection interface, beyond that you can use custom SQL or do work in your data source.

                - blend - Tableau-specific, this is a variation on a left join where there is a single primary data source and one or more secondary data source(s) that each blend to the primary data source. The data from each data source is aggregated prior to the blend, and what is used to blend are the linking dimension(s) that are specific to the particular worksheet. One effect of this is that a blend cannot increase the level of detail of results the way a SQL left-join can.

                 

                Noah's points about joins vs. blends are all true for me as well. Blends have a couple of advantages over joins in that they can cross data sources, let us issue queries at a different level of detail, and they can sometimes have performance improvements because Tableau will query the blends in parallel (two separate queries might be a lot faster than one big query). I'll often use blends when starting to build out a view or doing exploratory analysis, and then push the code back down into SQL for production. However, in some cases it's still easier to stick with the blend rather than do the ETL or query work to get everything to run.

                 

                So, when you write about adding the third table in and getting 10x the hours, I'm really not sure what you are doing that did that. If you're using a blend, then most likely what you did was something where the granularity of the view was finer than that of the Hours data source, and you were using something like SUM(Hours) from that data source. You might be able to get what you want using an aggregation like ATTR() or MIN() instead, but without seeing your view I can't tell, because knowing with the linking dimension(s) are is critical to understanding what the blend will be doing.

                 

                Does that help?

                 

                Jonathan

                • 5. Re: Third data source and many to many result
                  Claire Douglas

                  Thanks Jonathan and Noah!  Though some of this information has been disappointing, it's extremely helpful and clear.  I started graphing the calculations in separate worksheets, which I can pull together on a dashboard, but sadly that won't work for a chart with all of the data.  I think I'm going to go back to the drawing board (or copy paste land) and build a new stored procedure to show the visit information, hours, and RVUs including the MRN (for checking underlying data), Site, and Provider (for filtering and reporting).  Then I'll build another procedure with the cost and RVUs per site.

                   

                  It also sounds like our Tableau standards should be to not blend more than two tables- which is going to shift more of the lifting to SQL rather than Tableau.  The back end tables of our databases are way too complicated to join in Tableau, but I was hoping to have some procedures be more interchangeable and transparent to the end user.  This is sounding less and less possible.

                   

                  Following is an idea of the data, in case I'm missing some other way to go about it!  (And if there's a better way of posting mock up data please let me know!)

                   

                  Thanks for your help!

                  Claire

                   

                  Visits, Hours, and RVUs are joined together on Employee, Location, and Date.

                  Cost is joined to Hours on Location and Date.

                   

                  On top of everything, I will need to check on padding due to empty data.  There are times providers have hours and no visits, providers have visits but no hours, or there are RVUs for "visits" that aren't included in the visits (these are smaller services).

                   

                  Visits:

                  location_properAll_Prov_LastAll_Prov_FirstAll_Prov_Emp_NbrAll_Prov_rendering_prov_idMRNDate_of_serviceVisit_ID
                  Site1AProvider5555123451233/3/2014123Site141701
                  Site2BProvider4444456784563/4/2014456Site241702
                  Site1AProvider5555123451233/3/2014123Site141701
                  Site1AProvider555512345323/3/201432Site141701
                  Site1AProvider555512345453/3/201445Site141701

                   

                  Hours:

                  EE_ID_NmbrFirst_NameLast_NameProductive_HrsLocation_Workeddate_worked
                  5555ProviderA3Site13/3/2014
                  5555ProviderA4.5Site13/4/2014
                  5555ProviderA4Site13/5/2014
                  5555ProviderA6Site23/6/2014
                  4444ProviderB6.5Site23/8/2014

                   

                  RVUs:

                  location_properprovider_lastprovider_firstprov_emp_nbrrendering_provider_idRVUDate_of_serviceMRNVisit_ID
                  Site1AProvider555512345163/3/2014123123Site141701
                  Site2BProvider444445678323/4/2014456456Site241702
                  Site1AProvider555512345103/3/2014123123Site141701
                  Site1AProvider555512345203/3/20143232Site141701
                  Site1AProvider555512345123/3/20144545Site141701

                   

                  Staff Cost:

                  Location_WorkedStaff_Costdate_of_service
                  Site11003/3/2013
                  Site11103/4/2013
                  Site2653/5/2013

                   

                  Desired Table1:

                   

                  By Month (or year or quarter, etc)
                  LocationProviderHoursVisitsRVUsVisits/HourRVUs/Hour
                  Site1A, Provider10331003.310
                  B, Provider41655413.75
                  Site2B, Provider818952.2511.875

                   

                  Desired Table 2 (or best of all this would be in a site total line in the table above, but I'm thinking that would be asking too much):

                  By Month (or year or quarter, etc)
                  LocationCost/RVU
                  Site110.1
                  Site210.2
                  • 6. Re: Third data source and many to many result
                    Jonathan Drummey

                    Hi Claire,

                     

                    Can you put those tables into an Excel workbook and I'll take a look? I'm not seeing anything that would make it impossible to set up with data blending. What I'm thinking is making the Visits primary, and blending everything else to that.

                     

                    Jonathan

                    • 7. Re: Re: Third data source and many to many result
                      Claire Douglas

                      Hi Jonathan,

                      I had actually started this mock data with an excel workbook- but couldn't figure out how to attach it.  Due finding the advanced editor a mock data workbook is now attached!  I originally tried to make visits the primary data source in Tableau, but even then, when I pulled in information from both RVUs and Hours I started to see the 10x (though not exactly) greater results.  Any advice is welcome!


                      ~Claire

                      • 8. Re: Re: Third data source and many to many result
                        Noah Salvaterra

                        In your desired output there is a B under site 1, but I don't see that in the source data... should this output follow from the given input tables, or is it more of a sketch?

                        • 9. Re: Re: Re: Third data source and many to many result
                          Jonathan Drummey

                          Hi Claire,

                           

                          I found the same problem as Noah and a few more in the mocked up data, so I've attached a revised data set. Here's a view with everything in one:

                           

                          Screen Shot 2014-03-27 at 5.36.36 AM.PNG.png

                           

                          There are 4 data sources in one view, with blends at two different levels of detail, here are the linking dimensions used.

                           

                          - Employee number, location, date of service for the Hours & RVU

                          - Date of service & location for the Staff Cost

                           

                          This is a good demonstration of Noah's comment "differences in granularity will always be in your face" (I'm going to quote you on that regularly, Noah), if you remove any of those linking dimensions from the view or change to link on the MRN or Visit ID then some calculations will change results.


                          Since the Staff Cost information is at a coarser level of granularity, I did three things to make it work in this view:

                          1) Used MIN(Staff Cost) as the aggregation. When put as a discrete header at the Site level, it returns the same value for each Provider so only generates one header.

                          2) Changed Provider to ATTR(Provider) so that way Provider is an aggregate and won't be part of table calculation addressing or partitioning. When I'm going to use table calculations I only keep the dimensions needed for addressing and partitioning in the view, all other dimensions that aren't part of that but still need to be in the view for display purposes get aggregated using ATTR().

                          3) Used a Table calculation [Staff Cost]/WINDOW_SUM([# of RVUs]) with a Compute Using of the provider ID to sum up all the RVUs across all providers for each day/location combination. This is also a discrete header at the site level.


                          You had mentioned wanting to do a total at the site level, I presume that means using Tableau's totals/subtotals functionality. You can do this and it will work:


                          Screen Shot 2014-03-27 at 5.50.21 AM.PNG.png

                          However if you want to have the Staff Cost & Staff Cost/RVU with the rest of the numbers, you'd probably want to do something different to not show a Staff Cost for every provider, most likely using table calculations.


                          Now there's the question of higher levels of date aggregation. In order for the data blending to work, you *must* be blending on the linking dimensions and this will work for the visits, hours, and RVUs data. However, because the Staff Cost data is at a different granularity, when the day granularity goes away in the view results start changing. I set up an example in this mockup data where a calculated field puts the first two days in one bucket and the last day in another:

                           

                          Screen Shot 2014-03-27 at 6.01.19 AM.PNG.png

                          There are two different values of staff cost in the data because there are different dates. In order to have the staff cost properly aggregate for each day across the whole set of days, you'd need to have the individual days as a dimension in the view, on the Level of Detail Shelf and sum them up with a table calculation. In other words, no matter the date level if you want to have the Staff Cost & Staff Cost/RVU in the view, you must have the day of date in the view. That would lead to the regular aggregations for # of Visits, # of RVUS, etc. to be at too fine a level of detail for the display, so they would need to be re-aggregated with table calculations that compute along the day of date. So it's tricky (but not impossible) to set up. If you're wanting to use Tableau's totals and grand totals on top of that, then that will add another level of complexity. I didn't go further with this because it's hard to set up a good working example without more data.

                           

                          Note that this last issue with the staff cost isn't so much an issue of data blending, it's an issue with working with multiple levels of granularity that will be there whether you create a query to flatten out the data or not.

                           

                          Jonathan

                          • 10. Re: Third data source and many to many result
                            Ville Tyrväinen

                            Really nice work Jonathan

                             

                            Ville