6 Replies Latest reply on Nov 15, 2017 11:57 PM by Sanjiv Jivan

    Incremental refresh impact on database

    Matt Schutz

      I'm looking into incremental refreshes on Tableau Server of a Hadoop data source.  Ultimately I will be doing the same with Splunk and Oracle DB data sources.  I have reviewed this documentation: Refresh Extracts

       

      My Hadoop Hive query looks something like this.

       

      SELECT DateField, SomeTextField, SomeIntField FROM MyTable

       

      I will make DateField the key field to identify new rows.  I understand that only new records will be added to the TDE.  However, is the query against the database the same (ie - the same results are pulled as in the full refresh and then filtered by Tableau) or is the query modified by Tableau to be something like SELECT DateField, SomeTextField, SomeIntField FROM MyTable WHERE DateField > (the greatest existing date in the extract) ?

       

      From a Hadoop/Splunk/Oracle performance perspective, it doesn't do us any good to query all the results, but if it's parameterized like the "WHERE DateField..." then that's definitely worthwhile.

       

      Does this apply to all data source types, and is it possible or necessary to add parameters to the data source query itself?

       

      Thanks!

        • 1. Re: Incremental refresh impact on database
          Galen Busch

          Hi Matt,

           

          Incremental is indeed incremental, and great if you're connecting to transaction data.


          Are you connecting using Tableau's data designer interface, or a custom SQL query?

          • 2. Re: Incremental refresh impact on database
            Galen Busch

            Edit to my above post - all of the research I've done has pointed me towards incremental is NOT indeed incremental from the database perspective. As I read;

             

            - Tableau sends the entire query to the database
            - Tableau then uses your date field to select rows from the results it queried from the data source

            - Tableau then unions your existing data with your incremental results

            - Tableau loads new unioned .tde into server

             

            From the database perspective, you likely won't see an increase in performance. From Tableau's perspective, you don't have to regenerate and load an entire extract file for every refresh.

            • 3. Re: Incremental refresh impact on database
              Matt Schutz

              Hmm... that's frustrating.  I can fully understand the technical limitations in not doing it the other way around.  Is there a way to perhaps use a parameter to help "guide" Tableau?  Parameters are allowed in queries within a workbook, but what about parameters for published Server data sources?    

              • 4. Re: Incremental refresh impact on database
                Galen Busch

                I guess my best advice would be to stage the data.

                 

                Create a table, set up a task scheduler to drop and re-run the table daily. Each day, Tableau would hit this table, see the date field with fresh dates, and import the entire table. That would make it incremental from both the database perspective and Tableau's perspective. To load the entire table first, change your date parameter from X past date to today, and then set up the task scheduler to load yesterday's data in every day.

                • 5. Re: Incremental refresh impact on database
                  Jason Miller

                  Galen,

                   

                  I was confused on how incremental refreshes actually work. I have a 50M row Tableau published extract sitting on the Tableau server and want to incremental refresh from a SQL server based on an added datetime field. It seems the process is refreshing, buts its taking ~16 minutes regardless of how many "new" rows I have. If I have 100 or 50,000, about same runtime.

                   

                  I think as you note, its running the entire query then determining what to "add". This seems counter intuitive based on the configuration. One would think Tableau would send the max value to the database since it asked us to configure the incremental column, it can do this with parameters. It's also not scalable, as my source SQL table grows, its going to take longer and longer to perform an incremental refresh.

                   

                  Your idea might be the only way to solve the issue. Basically only output "new" rows of data to my SQL table that I want Tableau to refresh from.

                   

                  I was considering using an append API option with TABCMD to my existing Tableau extract. Any thoughts on this option?

                   

                  Question:

                  Does any one where to access the "extract history" window mentioned in the below link? Is it via Desktop or server, I couldn't see row counts anywhere.

                   

                  Refresh Extracts

                   

                  Any Tableau developers want to add insight into extracts?

                   

                  -Jason

                  • 6. Re: Incremental refresh impact on database
                    Sanjiv Jivan

                    We are going to give this approach a go : Refreshing Large Extracts Faster

                     

                    Sanjiv