12 Replies Latest reply on Jan 27, 2015 3:50 PM by Aakash Hongal

    Filtering large data set efficiently without killing the database?

    Jason Wilson

      I am new to tableau, and am having performance issues with my report (which I am sure is user error on my part).  I have a report that uses a Custom sql data source (from MySQL db).  The query in the custom SQL joins several large tables, thus yields a very large result set, say 500K rows or more.  One of the main columns we want to filter on the customer_account_id, which is returned in the custom sql, and contained in most of the tables.

       

      My first implementation was to use a quick filter on that column, which in turn returns every row of that data (500K+) to Tableau from the db, which then only displays the rows we want, which is only a few.  Which was VERY slow.

       

      My next attempt was to change the quick filter to a context filter.  It seems it still has to return all that data to build the list of customer account_ids for the drop down, so the report still takes forever to load.

       

      My question is how should I implement this to reduce the load on my database server and make the report run as efficiently as possible  I don't want Tableau to return all the data just to get a list of customer_account_ids to show in the filter drop down.  Was hoping that could somehow be populated with a simple select distinct customer_account_id from accounts type query, then as the id(s) are selected, Tableau would only get those records from the database (based on my Custom SQL query) instead of all accounts.  Is this possible?  Let me know if this is not clear what I am trying to ask.  Thanks!

        • 1. Re: Filtering large data set efficiently without killing the database?
          Matt Lutton

          First of all, if you can use the Multiple Tables Option instead of Custom SQL, this tends to operate much better in Tableau.  I don't know the specifics, but I've read this many times.  Here is a great example: Tableau Performance: Multiple Tables or Custom SQL? | Tableau Love

          Secondly, are you using an Extract?  Again, not doing so can be a killer to performance.

           

          I don't know what other fields you might have, but you can set Data Source Filters to remove any unneeded data being brought into Tableau.  That may also help--for instance, I typically do this with years, as I'm only concerned with last year and this year in many of my dashboards.  If you only need certain customers, say from specific regions, you could set a data source filter on region to only return the ones you want.

           

          Those are my suggestions--others will have more, I am sure.

          • 2. Re: Filtering large data set efficiently without killing the database?
            Pedro Machado

            Hi Jason,

             

            A couple of comments/ideas:

             

            - If your datasource is the result of a slow join, I have found the best way to deal with that is to either a) create a table that stores the joined data (or a materialized view if your DB supports those) or b) use Tableau extracts. The idea is to do the join only once not every time the Tableau view changes.

             

            - In your specific example, it sounds like you are trying to implement a customer lookup. It seems to me you may be able to use a view that has a list of customers and then another one that has the details for one or a few customers.

             

            If this is the case, you can connect to your customer table from one datasource and create another datasource that has the join of multiple tables like Matthew is suggesting (not via custom SQL).

             

            Then you would blend the data between the two views in tableau using the customer ID.

             

            The idea is that the "SELECT DISTINCT customer_id FROM accounts" query will be done against the account table only and this would populate a worksheet with just the IDs.

             

            You would then click on the IDs you want to lookup and Tableau would issue a query to the other datasource that contains the joins.

             

            This should be faster as long as your tables are indexed in such a way that retrieving a single customer is fast. If you don't have the right indexes, the database will still have to do full table scans to locate the data for a single customer.

             

            Does this makes sense? Let me know if this works your environment.

             

            Thanks,


            Pedro

            1 of 1 people found this helpful
            • 3. Re: Filtering large data set efficiently without killing the database?
              Jason Wilson

              Matthew and Pedro thanks for the responses!  Much appreciated!  Sorry I did not mention that before, but this is a live data connection.  Pedro, what you described with the two data sources is just what I was thinking.  Can you send or point me to instructions on how to implement this in a report?

               

              As for the Custom SQL queries, they are pretty solid and efficient as the subsequent underlying tables are properly indexed.  I am pretty well versed in sql and database designed (not that I don't mistakes!).  The backed db does not support materialized views, but they were not an option anyways as the data is constantly updated and the customer requirement is to see real time data.  Which also brings me to my next question (as I said I am a newbie to Tableau).  When a new customer is added to the db, etc.  Will those automatically start showing up in the report param drop down, or do I need to do anything to re-create them.  Their intent is run these reports primarily via the tableau server website.  Thanks again!

              • 4. Re: Filtering large data set efficiently without killing the database?
                Pedro Machado

                Hi Jason,

                 

                For the connection, just connect to data twice. Once to the accounts table and the second time to the join. As Matthew suggested, there may be a difference in the performance between specifying the join in Tableau and using custom SQL (slower):

                 

                Connecting to Multiple Tables

                 

                Tableau doesn't yet support dynamic parameters so they won't be updated if your data changes. The workaround is to use a view that is connected to the data (and thus refreshed dynamically) and use it as a filter.

                 

                This post by Jonathan Drummey discusses this:

                 

                Creating a Dynamic “Parameter” with a Tableau Data Blend | Drawing with Numbers

                 

                The next step would be to create a view that shows all the customer IDs and a view that shows the detail. Put both on the same dashboard and then configure a filter action on the dashboard so that when you click on a customer, Tableau will filter the other view, effectively issuing a query to the joined tables where customer id = the selected ID.

                 

                Here are a couple of documents on blending and filter actions:

                 

                Working with Multiple Connections

                Filter Actions

                 

                Let me know if you have questions.


                Pedro

                • 5. Re: Filtering large data set efficiently without killing the database?
                  Matt Lutton

                  To be clear, the problem with Custom SQL is not an SQL problem or anything to do with how queries are written, but rather how Tableau handles the Custom SQL option versus Multiple Tables.  You are much better off, in terms of performance, if you use the Multiple Tables option in Tableau.

                  • 6. Re: Filtering large data set efficiently without killing the database?
                    Pedro Machado

                    Jason,

                     

                    One more thought ...

                     

                    If you will be using Tableau server, you may want to use extracts.

                     

                    Extracting Data

                     

                    You can set these to refresh at set intervals. If you don't require the data to be available in real time, this could work well for you.

                     

                    If you go this route, you can use the custom SQL you already have as the query would only run when the extract is being refreshed.

                     

                    Essentially, Tableau would run the query, download the result of the join, and create/optimize the extract. This may save you from having to connect to two data sources and do what I suggested as the performance should be better than that of the join.

                     

                    Let me know what works best.

                     

                    PEdro

                    • 7. Re: Filtering large data set efficiently without killing the database?
                      Jason Wilson

                      When you say view are you talking about a database view or something in Tableau?  So when/how are parameter drop downs populated in Tableau?  I was unaware this was not dynamic.

                      • 8. Re: Filtering large data set efficiently without killing the database?
                        Pedro Machado

                        EDITED

                         

                        When I said view, I meant Tableau workbook sheet. Parameters can be populated from the contents of a database field at the time you create/edit them, but cannot be set to update automatically.

                         

                        See Parameters

                         

                        Quick filters are dynamic, but they will have the performance issue you were describing on your initial post. If your source is custom SQL, Tableau will run the full SQL to retrieve the list every time it needs to refresh the filter. This may perform better if you use extracts.

                         

                        Using Quick Filters

                         

                        Since you are proficient in SQL, you may want to take a look at the Tableau logs (stored under ...\My Tableau Repository\logs) to get a better understanding of how Tableau works. You'll be able to see the query that is sent to the DB.

                         

                        Pedro

                        • 9. Re: Filtering large data set efficiently without killing the database?
                          Jason Wilson

                          Is there a video of the Creating a Dynamic “Parameter” with a Tableau Data Blend | Drawing with Numbers example somewhere, or someone I could work with directly via email/phone/webex?  I am missing something here and having a hard time understanding these steps.  It would seem my issue is a pretty common reporting 101 level kind of an issue to effectively query a database. 

                          • 10. Re: Filtering large data set efficiently without killing the database?
                            Aakash Hongal

                             

                            Essentially, Tableau would run the query, download the result of the join, and create/optimize the extract. This may save you from having to connect to two data sources and do what I suggested as the performance should be better than that of the join.

                             

                             

                            Pedro,

                             

                            I am a newbie to Tableau. I have an interesting scenario here. I have two data sources ( a cube and relational database). Now I want to create a tableau extract out of the two data sources that has to be refreshed at certain time intervals. Can this be done?

                            • 11. Re: Filtering large data set efficiently without killing the database?
                              Jeff James

                              Aakash,

                               

                              Tableau can create an extract for each data source, but not from both combined.

                               

                              A number of ETL programs can output extract files or If you have programming skills a single extract could be created using the Tableau Data Extract API.

                              • 12. Re: Filtering large data set efficiently without killing the database?
                                Aakash Hongal

                                Thanks for the response Jeff, I will probably explore programming option.