11 Replies Latest reply on Aug 10, 2017 7:02 PM by Ben Neville

    Using table relationships vs data dump

    Tiberius Busu



      So, I am getting the Tableau data from Access in form of tables and I am linking them in Tableau. I have 7 tables: 3 have the main data (~550K rows) and the rest are mapping tables (<30K rows). Once linked together I get to about 50M cells (.csv exported).

      The issue I have is that for some visualizations based on ad-hoc filters, the response time is over 10 seconds. Adding the fact that I need to run these are ad-hoc searches very often, you can imagine the efficiency of using Tableau dropping.


      I am thinking of pre-linking the data and exporting it as one table from MS Access (as Access worksa way faster at this) - would that improve the Tableau response time ?

      I am open to any suggestions (except for including the .twbx, as my data is sensitive and confidential).



        • 1. Re: Using table relationships vs data dump
          Galen Busch



          I assume you've already extracted the data.


          Firstly, data source filters are the most obvious. If there's any way to limit the scope - start there.


          Secondly, context filters. If there are any high-level categorical filters that could initially (and dynamically) limit the scope of your data, add those dimensions filters to context. (right click, add to context)


          There are a few performance tuning options you could consider. "Aggregate data for visible dimensions" will restrict the size of the extract once you have completed your project.


          If you are using calculations you can tune those calculations.



          • 2. Re: Using table relationships vs data dump
            Tiberius Busu



            As much as I start with filtering everything else beforehand, but I need to make most of the searches on the entire dataset, then look at the rest of the visualizations.

            I only have 4 calculated fields and there's nothing complex in there either that would slow it down.

            I will explore the tuning option and see where it gets me.


            But, in between linking tables in Tableau vs importing one giant table already linked - any thoughts on this ?




            • 3. Re: Using table relationships vs data dump
              Galen Busch

              If you're using extracts, which is best practice for large datasets such as yours, it won't make a difference from a performance perspective.


              At TC16 Tableau announced the Hyper data engine which will increase query performance using extracts as much as 10x.

              • 4. Re: Using table relationships vs data dump
                Ben Neville

                Are you running against an extract or live connection?

                • 5. Re: Using table relationships vs data dump
                  Tiberius Busu


                  Live connection on 50M cells with the DB on a LAN is suicide

                  • 6. Re: Using table relationships vs data dump
                    Chris McClellan

                    Do you mean extract as in TDE or extract as in CSV ?  You talk about exporting to CSV, but not really that you're using a TDE. 


                    I don't get the 50M rows - are you joining the data properly, or not joining and getting a cartesian product (a multiple of every record in TableA joined to every record in TableB) ?


                    So I'd definitely say that joining your data in Access before extracting would be a good idea.  A better idea would be to use something better than Access (SQL Server Express maybe ? because it's free) and create a view to join the tables within the database before extracting.  When you extract, don't bother with CSV, create a TDE (Tableau Data Extract) and there will be speed improvements immediately.  Although I think your biggest win is to reduce the row count because it sounds too high compared to how you describe the base tables.

                    • 7. Re: Using table relationships vs data dump
                      Tiberius Busu

                      Yes, I meant the TDE extract. (I used .csv to calculate the nbr of cells). The 50M would be from the 550K rows x 90 columns (from all linked tables - I guess that's how Tableau combines it).


                      I'll try using an Access query to join first. Unfortunately I cannot install SQL server express as IT doesn't allow it so Access is my only option.


                      The row count is fixed (550K), only the columns vary based on the joins.



                      • 8. Re: Using table relationships vs data dump
                        Justin Larson

                        In either case, when the data gets loaded into the extract, it's consuming a flattened resultset and loading into a columnar database (the extract). If you flatten it in Access and record those results to a table then load the extract from the flattened table, the loading itself may be faster, but the interaction with the workbook/extract will be the same.

                        • 9. Re: Using table relationships vs data dump
                          Chris McClellan

                          Columns makes the datasource wider (ie more dimensions and measures), but shouldn't make it longer (ie increasing number of records), but it all depends on the join.


                          Tableau will definitely count records as rows, not as columns.

                          • 10. Re: Using table relationships vs data dump
                            Justin Larson

                            You will see records manifested as rows on the front end, but when you create an extract Tableau actually splits each column up and writes to disk separately (the extract is a columnar database). It's [generally speaking] much easier to get performance mileage by reducing the number of columns than by reducing the number of rows. Similarly, high-cardinality columns are expensive. Columns with limited number of distinct values are preferable to, for example, and id column.


                            The high number of records are a function of the joins, sounds like.

                            • 11. Re: Using table relationships vs data dump
                              Ben Neville

                              In an extract this isn't really the case, as only the needed columns are unpackaged into RAM. If you're using 1 or 2 columns from a 10,000 column extract, while taking very long to generate, it should have the same performance at run time as a 10 column extract. In any case, 550k rows is nothing in terms of data volume, so then I start to get into the discussion of calculation creation and making sure as much as possible can be materialized, as well as adhering to dashboard and view best-practices.


                              Beyond trying to optimize the extract to materialize as many fields as possible, can you hide as many unused fields as possible (most people wouldn't use anywhere close to 90, though you may be an outlier) and aggregate for visible dimensions? This might reduce the row count.