2 Replies Latest reply on Mar 23, 2017 8:33 AM by Sunitha Sanka

    Incremental Extract processing and maintenance

    Sunitha Sanka

      Hi All,

      I am Sunitha. I am new to tableau and had few basic questions on Incremental extract refreshes. I am more involved on the server installation and maintenance side and looking for answers in that perspective.

       

      Incremental Extract Creation :

      We have schedules set up that incrementally refreshes an extract. A folder gets created for each of the refresh even when 0 records are inserted.
      1. How is the extract storage on server disk accounted towards space allocated to a site?
      2. Is it even considered in the site usage or is it calculate only based on the workbooks, extracts, data sources published to the site?
      3. Are all the new folders created during the incremental refresh be included in the space calculation for the site?

       

      Incremental Extract Processing :

      1. Does tableau server automatically point to the latest incremental extract for the workbook or does it point to all the extracts created as each may contain just the newly added rows? What happens if we do a weekly refresh of the same data source, how does it consider that extract as the one containing the most up-to-date info ? 
      2. How does it calculate the most recent date/time or ID (the field used for capturing the incremental data). ? Is this a separate query each time sent to the data source?  Or is it used in the filter clause as a sub-query ? How many queries are being sent to the data source in this case ?
      3. Is there some kind of logs or some tables in Postgres DB repository where I can also find these details on what query was sent and how long it took to complete?
      4. The view on Background tasks for extract does provide few details, but I need to dig deeper and investigate which step in the refresh is taking majority of the time.

      Extract Maintenance & Deletion :

      What are the best practices to clean up the older extract files (can be incremental or full refreshes) that get created for each schedule run. We run daily and weekly backups with cleanup command used in the script. I understand that cleanup command only clears out the temp files but how do clear out the ones created under Tableau\data\tabsvc\dataengine location when they are no longer needed?
      We are going to suggest that any Incrementally refreshed extracts should be fully refreshed at regular intervals (e.g. every weekend) in order to maximize performance. In this case, how can we as admins make sure to clear out the old extract files present in the data engine folder on the worker nodes?
        
      Any help is much appreciated. I apologize if any of my questions are not clear.

       

      Thanks,
      Sunitha

        • 1. Re: Incremental Extract processing and maintenance
          Dmitry Chirkov

          Incremental Extract Creation

          Temporary files and folders created during extract refresh should not be counted towards site allocation. Only the resulting extract files.

          Where are you seeing these new folders by the way?

           

          Incremental Extract Processing

          1. Yes as original extract is replaced with a new updated one. We only keep one file and it contains all the rows (even for incremental extracts).

          2. Latest timestamp/ID is preserved during extract refresh and preserved as an extract attribute. This timestamp/ID is getting added to a filter clause (WHERE date > max_extract_date) during extract refresh. Query count is not a trivial question as it all depends on datasource, relation and datasource and extract filters.

          3. C:\ProgramData\Tableau\Tableau Server\data\tabsvc\vizqlserver\Logs\backgrounder*.txt - its all there

           

          Extract Maintenance & Deletion

          Tableau should not be leaving intermediate extract files. Also, extract files are not yet versioned so you'll have a single extract file per datasource and in order to delete that - just delete unused datasource (or workbook with embedded extract).

          If you see otherwise - file a support case to get it addressed.

          • 2. Re: Incremental Extract processing and maintenance
            Sunitha Sanka

            Thank you for your response. All of my questions are answered. The folder where I see all the incremental/full extracts created is under the TableauServer\data\tabsvc\dataengine\extract folder. As you mentioned Tableau is automatically clearing the old folders and freeing up the space. Its mostly maintaing that day's data and I am guessing this is because we run daily backups and use cleanup command in the scripts that is taking care of the intermediate extracts cleanup.