12 Replies Latest reply on Apr 18, 2013 1:10 PM by Peter Thomsen

    star schema performance

      I have a star schema in SQL Server with about 50 million rows.  The table is pretty skinny with about 8 dimension and about 5 measures.  One dimension table has about 300 rows and is related to the fact table with a column of data type smallint.  Let's just say the dimension table we're talking about is regions and the fact table is sales measures.

       

      My question is this.  When I create a data connection and add multiple tables and join my fact table to my dimension table and then drop the region field onto a row or column shelf, it takes 30 seconds to query the db and populate.  (I traced the sql, and it's doing an inner join).  While it gets the correct result, I'm wondering if there's a way for Tableau to just query the regions table for the data instead of joining them. 

       

      As soon as I start to add different dimensions to the viz and put the regions as a quick filter, the time to render the viz is unacceptable.

       

      Is there a best practice for something like this?

        • 1. Re: star schema performance
          James Baker

          There are a few things you could explore:

          1. See that your SQL Server database is properly indexed on the relevant columns (or combination of columns) of the relevant tables.

          2. Create an extract and see how the performance of Tableau's Data Engine compares.

          3. Create two separate connections to your two tables and use data blending (multiple data connections in one viz) to accomplish the same thing.

          • 2. Re: star schema performance
            Robert Morton

            I second the suggestion James makes about data blending. This can be a very effective way to bring in attribute-level data for your fact table.

            -Robert

            • 3. Re: star schema performance

              Nice!  I knew about data blending, but didn't realize how it was implemented.  I kept joining the tables thinking it was the same thing.

               

              OK, that speeds things up immensely.  Now I have an implementation issue.  My dimension table of regions is related to my fact table by a region_id column.  Obviously I don't want to see this, but instead that actual region_name, but when I do that Tableau complains that I "Cannot blend aggregated data from the secondary data source with the data on this sheet because there are no linking fields used in the view".  I can obviously bring the region_id in, but that is ugly and I don't want to see that.

              • 4. Re: star schema performance
                Robert Morton

                Hi,

                 

                You will need to have 'region_id' in the level of detail, for example in the viz itself or on the LOD shelf. If the dimension table values never change (i.e. no new regions, and no changes to region names) then you can take advantage of a handy feature.

                 

                First, you'll need a simple viz with your Primary linking field, 'region_id', and your Secondary attribute dimension, 'region_name'. Right-click on the 'region_name' field and choose Set Primary Aliases. This will open the Edit Aliases dialog for your 'region_id' field. Click OK to save the changes, and now you will see descriptive values for your original 'region_id' numeric values. Since aliases are part of the join criteria for data blending, this will leave your simple viz with a bunch of null values, which is what we expect. Now you can discard your secondary data source since you no longer need 'region_name' -- the aliases are persisted as part of the workbook and are associated with the 'region_id' field from this point forward!

                 

                -Robert

                • 5. Re: star schema performance

                  OK, I understand, but doesn't that negate the performance that I just achieved with the data blending?  Wouldn't it be better for me to basically rearchitect my fact table to include the actual values, and then just join on them?

                   

                  Fortunately for this project I have the ability to architect the data however I need.  I'm working on trying to get the dashboards to perform acceptably against a live data source of 50M rows.  Unfortunately, I need the live data source as I need to do row level security.

                  • 6. Re: star schema performance

                    Also, Region isn't my only dimension.  I have about 8 dimensions (e.g. Business Type, Product Type, etc).

                     

                    I got a data blend to work nice and fast with two data sources (primary & secondary...joining the values together instead of the id's...not the optimum architecture, but it works), but now I'm not sure how to use more than one dimension table with data blending...I keep getting the error I mentioned above.

                    • 8. Re: star schema performance
                      Robert Morton

                      Sorry, but I'm too swamped right now to look into this.

                      -Robert

                      • 9. Re: star schema performance
                        guest contributor

                        Hi Robert, I'm curious to know if you've had time to have a look at ihavenosuperpower's question.

                        • 10. Re: star schema performance
                          Robert Morton

                          Hi TMOD,

                          No, unfortunately I have not. If you have a specific question please start a new forum thread.

                          -Robert

                          • 11. Re: star schema performance
                            Peter Thomsen

                            I can only agree with suggestions from James

                             

                            A couple of tricks if you are using SQL server.

                            1. Create indexes on keys in fact table - start with one index for each key
                            2. Set primary key on all dimensions (and use it for the joins) - make sure that the primary key have a clustered index
                            3. Create FK-restrains - it helps the performance optimizer. Be aware that you can't drop/truncate the fact table when you have FK-restrains.
                            4. Use the multiple table dialog to join the tables. I think that it allows Tableau to make join culling (could either Robert og James correct me if i'm wrong on this one). Join culling means that a join is discarded if it is not having implication on the result

                            If it is not giving you the required speed then you need to do some more advanced performance tuning.

                             

                            But I would start with suggestion 2 - create an Extract as the starting point if you can live with the latency of having it processed.

                             

                            //Peter

                            • 12. Re: star schema performance
                              Peter Thomsen

                              Just found this great post by Russell on join culling http://tableaulove.tumblr.com/post/11692301750/what-i-learned-about-tableau-join-culling-over-summer

                               

                              It is a must read when working with Tableau on live data warehouses

                               

                              //Peter