3 Replies Latest reply on May 16, 2018 7:57 AM by Eric Viglotti

    Data blending vs joins in one data source when all data coming from one datawarehouse

    Eric Viglotti



      We have a star schema data warehouse that has say 10 different facts for reporting on different things. There are about 7 dimensions that conform to all 10 facts, however, each fact also has a handful of dimensions that pertain only to it. A classic example would be orders and returned items where returned items would have a "return reason" dimension that orders does not.


      If these are in different sources, then obviously we blend with two different data sources in Tableau. However, if they are all in SQL and it's a simple join between all tables in one single database, what is the most common practice? I ask because if I create one giant single data source for all 10 facts, a.) it seems like Tableau struggles with drawing joins between one dimension and multiple facts (whereas it is super easy to take one fact and join to each dimension), b.) it probably is going to be pretty complicated to the user to know when to drag in a dimension that conforms vs one that doesn't (see returned items example above).


      For those reasons, I am leaning towards making 10 different data sources, which each data source having its one fact plus all 7 conformed dimensions plus any other specific dimensions to that fact. It sure seems like performance isn't an issue because if you are showing something like total orders next to returned item count by customer, data blending won't have an issue as the results from both queries will be super small. But admittedly there is a lot of overhead to that as we have to initially create all 10 of these data sources and I don't know if this puts us in a more non-traditional approach.


      Any real-world insight would be very appreciated!



        • 1. Re: Data blending vs joins in one data source when all data coming from one datawarehouse
          kumar bharat

          Hi Eric,

          You can do the following:

          • run a test sql query in directly in database directly by joining all the tables needed and see the volume of data and the time it takes to run in database before using in tableau.
          • there can be a case where indexes are not correctly defined in database on right columns and tables are huge with number of columns and volumes of data so i suggest run the query in database and see the explain plan and see the indexes and other stuff defined at db level
          • create views in database and use the view in tableau,materialized views and other views instead of querying tables directly.
          • use multiple common fields between tables to join on them to get accurate results in tableau

                   check this link    Join Your Data

          • you can use sql query  on top of tableau extracts
          • use DB filters and extract filters,check the order in which the filters are  loaded in tableau,extract filters are  loaded first then db filters.

          it all depends on how are designing and fetching the data.

          hope it helps.



          • 2. Re: Data blending vs joins in one data source when all data coming from one datawarehouse
            Peter Fakan

            Hi Eric,


            By default I tend to advise that if the data is coming directly from a database connection then do joins, for everything else start with blends. Naturally this is just general advice.


            Another thing to consider in between your two options is to just create some reporting views inside your database and then connect directly to that instead of the fact tables - this way you avoid having to do any blends or joins as the data you need will already be materialized inside the database. I'd preference this option because you can limit or extend the data that you release through this pipeline to suit the experience level of your users





            • 3. Re: Data blending vs joins in one data source when all data coming from one datawarehouse
              Eric Viglotti

              Thanks for the advice. I think my overarching question is less about performance and more about the user experience. I guess in your experience, whether you are doing joins in the database via a view or in Tableau, how do you present multiple facts to a user when they share many, but not all of the dimensions? My concern is if, say, I join both orders and returned items to customer, location, product, etc, I'm fine as I can just show all of the dimensions and all of the measures on the one data source and we're good. But in this example, the returned items has it's own dimension for return reason, where do you put that if it's all joined in one data source so that the user doesn't try to report the order measures across the return reason and get tons of double counting?


              Obviously in this example, it is intuitive that you would never trend your initial orders by return reason, but in doing BI for a long time with other tools, I have found that on more complex star schemas it is very easy to lose track of which things conform to which.


              It would seem like having each fact in a "silo" in it's own data source would create a bit more of a logical separation to the user, but it sounds like that might not be the standard, so I'm just curious what other people do out there in this regard...