3 Replies Latest reply on Mar 10, 2016 1:07 PM by Lena Grygoryev

    Data Blending when both data sources are published to server and dimensionally modelled

    Lena Grygoryev

      Hi everyone, so here is a starting condition:

      there are 2 (or more) data sources each of which is a dimensionally modelled datasource and published to Tableau server.

      That is both are not a flat table but a set of tables with defined joins.

      Each datasource is a pure star-schema, that is has one Fact and several Dimensions. And some dimensions can be common between the two datasources.

      Questions:

      1) is it okay to do data blending on such two datasources or not. That is to create a workbook which draws data from both? And why.

      2) if yes:

      - what do we need to do in Relationships box - remove all except the ones we use, or doesn't matter?

      - what are the pitfalls to be aware of, that is how to do it properly, and what to avoid while doing it?

      I would also like a good explanation of what happens behind the scenes when doing this.

       

      Note: I read all the basic information about Data Blending and searched the internet. I did not yet found a good explanation.

       

      thank you for all answers upfront!

        • 1. Re: Data Blending when both data sources are published to server and dimensionally modelled
          Bill Lyons

          Yes, you can blend two data sources. My general rule is: when in doubt, try it!

           

          While the source may contain many joins in a star schema, the resulting data set as seen in Tableau is a flat table. So, you would be simply blending two flat tables, regardless of the underlying structure that produced the tables.

           

          I don't understand the "And why" at the end of question 1. Why is it okay to do this? Because you can. Because it accomplishes your goal.

           

          As for the relationships box, you should remove any that are inappropriate (i.e. the names may be the same but they mean different things), and add any that are appropriate (i.e. equivalent fields that have different names).

           

          As for understanding what is happening "behind the scenes," I think the best authority on data blending that I know is Tableau Zen Master Jonathan Drummey. I highly recommend reading articles in his blog site Drawing with Numbers | Thoughts on data visualization and Tableau, and watching the video from his TC14 session Post-Conference Materials | Tableau Conference 2014. In short, blending is somewhat similar to a SQL left-join, but it is different in that the right side of the "join" (secondary table in the blend) must be aggregated. There are a few other idiosyncrasies that Jonathan details.

           

          I hope that helps.

          • 2. Re: Data Blending when both data sources are published to server and dimensionally modelled
            Lena Grygoryev

            I am in process of clarifying this with Tableau support. When I get the full understanding I will post the answer here.

            • 3. Re: Data Blending when both data sources are published to server and dimensionally modelled
              Lena Grygoryev

              Ok, after some discussions with Tableau support here is the answers:

               

              Is it possible/okay (supported by Tableau)? - Yes it is possible/supported to do data blending on two tableau data sources which are star schemas of their own.

               

              Why: mainly, the way I understood it, that every data source modelled in tableau, after all the joins are done is essentially a large flat table (conceptually), so blending two star schemas is like blending two huge flat data tables, where columns are all the columns in all the dimensions.

               

              What to be aware/wary of when doing it:

              1) left join: remember that blending is always a left join on the primary data source. So the data in secondary, that doesn't match, will be excluded

              2) non-additive measures (such as count distinct, medians, etc) need special care

              3) cardinality of the attributes on which the acting relationships are set. As for any blending, the smaller cardinality, the better performance. If you set blending relationships on high cardinality columns, good luck waiting for the results. And generally bad for performance.