2 Replies Latest reply on Aug 23, 2018 11:27 AM by Ryan Blosser

    Data blending, are there many practical uses for it versus joining?

    Ryan Blosser

      Hello,

       

      I know there are many questions on data blending but I have not found a resource that would help this completely click with me.  The second reason I am asking is because I work somewhere where there are individual tables as datasources that live on Tableau Server, and I can't join them, so naturally I feel like the best course of action is to create a new datasource that DOES join them.

       

      Attached is a workbook with three sheets and a dashboard.  There are two simple data sources that join on Id.  The idea was that the first thing someone would filter on is something that only exists in Datasource 1 (there is a hierarchy of sorts from Territory->ID in this case), so naturally I felt that I should make it my primary data source.  The other reason for doing so is that if I don't, I will have NULL in the Territory filter, and I do not know the best way to solve for that.  Create a duplicated Territory, filter that one for Non-NULLs, then use the second one?  The second datasource has a numeric field that I want to bucket.  There is a natural join here, but I cannot do that if both datasources are Tableau Server data sources.  My questions are as follows:

       

      a) If I set Sheet 1 to be filtered on Territory and ID (and make ID only filter relevant values, in an attempt to make a cascading filter), and I then use apply the "use on other worksheets" to make the ID column as a filter for the Sheet 2 (that still retains the Datasource 1 as the Primary), why does the Sheet 2 on the Dashboard not filter down to the narrowed down IDs, regardless of whether I use the ID column from the Primary source or the secondary source?  Does it negate the Territory filter that does not exist on Sheet 2?

       

      b) If I make a calculated field for Datasource 2, and I set this to be a filter on Sheet 1, it seems to work...but why can I not use this filter on Sheet 3 when the Datasource 2 is now the primary data source?  Is it because no relationship exists on this field?  I don't see why that would matter, and I would like to know more.

       

      c) For Sheet 2, even if everything visible is from the secondary datasource, I will still get asterisks.  I guess I felt like if everything was on one datasource, it could keep the granularity.  Does this automatically go out the window if a secondary datasource is being used?

       

      d) I always see the same application for data blending.  The tutorials are always "coffee store 1" and "coffee store 2" so it just feels like this is a "fake union" of sorts.  Maybe there is a quota application that could be useful (one file has attainment, the other quota per person).  But even still, I feel like I would union these and have an identifier to separate them.  I almost feel like if blending is avoidable, do it.  Is that bad advice?

       

      e)  If joining is the way to go, what sort of best practices are used in an organization with respect to publishing datasources?  I get the feeling that people would just make all kinds of datasources if they could.  For example, I have something that relies on two tables and maybe 8 columns.  I feel like that could be its own datasource, it is very lean and probably easily refreshable.  But then if someone has a slightly different request, do people just make one with 9 columns?  It doesn't seem like Tableau is good at "ingest this database x and retain the PK/FK relationships", but I don't know enough to have an informed opinion.

       

      I apologize if this wasn't explained well or if it gets asked a lot, but I haven't really come across something that explains this concept in a way that makes perfect sense to me and I thought maybe making my own example would help me understand.

        • 1. Re: Data blending, are there many practical uses for it versus joining?
          Andrew Bickert

          Hi Ryan,

           

          I will try an answer to the best of my abilities.

           

          Q1:

          a) If I set Sheet 1 to be filtered on Territory and ID (and make ID only filter relevant values, in an attempt to make a cascading filter), and I then use apply the "use on other worksheets" to make the ID column as a filter for the Sheet 2 (that still retains the Datasource 1 as the Primary), why does the Sheet 2 on the Dashboard not filter down to the narrowed down IDs, regardless of whether I use the ID column from the Primary source or the secondary source?  Does it negate the Territory filter that does not exist on Sheet 2?

          Although the territory filter on Sheet 1 is controlling (in part) the ID filter for only relevant values, it will not completely control Sheet 2 ID filter even though they are linked. The reason for this is the Territory filter does not "de-select" all of the ID's in the background, it is merely not showing the options. A good example of this would be to Choose all the Territories, All the ID's, then remove territory A. You will see that Sheet 1 only has the options (r,s,t,u,v,w) but Sheet 2 still shows all the ID's. This is because the parent filter Territory is not active on Sheet 2.

           

          Q2:

          b) If I make a calculated field for Datasource 2, and I set this to be a filter on Sheet 1, it seems to work...but why can I not use this filter on Sheet 3 when the Datasource 2 is now the primary data source?  Is it because no relationship exists on this field?  I don't see why that would matter, and I would like to know more.

          If I am understanding this correct, you are using dollar_cat as your filter? If so, than this should work if you add dollar_cat to your Sheet 3. If you are asking why you cannot apply dollar_cat filter that is already on Sheet1 to Sheet 3 to control it, it is because "you can only filter data across multiple primary data sources. You cannot filter data across secondary data sources. Filtering data across a worksheet's secondary data source is not currently supported in Tableau Desktop."

          Filter Data Across Multiple Data Sources

           

          These options are available for filters on secondary (see link above)

          • Join tables that are in the same data source, instead of blending them. For more information, see Join Your Data.
          • Create a cross-database join if your tables are in different data sources. For more information, see Join Your Data.
          • Make the primary data source the secondary data source, and the secondary data source the primary. Note: This might not always be feasible given the level of detail you want in the final view.

           

          Q3:

          c) For Sheet 2, even if everything visible is from the secondary datasource, I will still get asterisks.  I guess I felt like if everything was on one datasource, it could keep the granularity.  Does this automatically go out the window if a secondary datasource is being used?

          Pretty much, if you want to show the ID1 and the ID2 broken out, it should be on your primary data source, else you are trying two join more than 1 value off a single value which is why you get the asterisks. You can normally just rebuild your view with the higher granularity as your primary to get the view you want.

           

          Q4:

          d) I always see the same application for data blending.  The tutorials are always "coffee store 1" and "coffee store 2" so it just feels like this is a "fake union" of sorts.  Maybe there is a quota application that could be useful (one file has attainment, the other quota per person).  But even still, I feel like I would union these and have an identifier to separate them.  I almost feel like if blending is avoidable, do it.  Is that bad advice?

          I would personally use joins over blends as much as possible. The main use case I see for using blends is if you have data coming from multiple data source types (ie: SQL and excel).

           

          Q5:

          e)  If joining is the way to go, what sort of best practices are used in an organization with respect to publishing datasources?  I get the feeling that people would just make all kinds of datasources if they could.  For example, I have something that relies on two tables and maybe 8 columns.  I feel like that could be its own datasource, it is very lean and probably easily refreshable.  But then if someone has a slightly different request, do people just make one with 9 columns?  It doesn't seem like Tableau is good at "ingest this database x and retain the PK/FK relationships", but I don't know enough to have an informed opinion.

          For us, we give the ability of our analysts to create stand alone dashboards using non published data sources for one off requests. Everything else should have a published data model that they reference. That being said, if a user is wanting to join two published data sources, it would have to be joined before publishing up to Tableau Server. The ability to join multiple published data sources is not currently available in Tableau Desktop. There is an idea about it so please vote it up: https://community.tableau.com/ideas/5566   There is a roundabout way with Tableau Prep I believe but it is not the same as doing it in desktop.

           

          I added a couple tweaks to your dashboard for reference.

           

          Kind Regards,

          Andrew

           

          1 of 1 people found this helpful
          • 2. Re: Data blending, are there many practical uses for it versus joining?
            Ryan Blosser

            Perfect, this helps a lot, thank you.