I am Sunitha. I am new to tableau and had few basic questions on Incremental extract refreshes. I am more involved on the server installation and maintenance side and looking for answers in that perspective.
Incremental Extract Creation :
We have schedules set up that incrementally refreshes an extract. A folder gets created for each of the refresh even when 0 records are inserted.
1. How is the extract storage on server disk accounted towards space allocated to a site?
2. Is it even considered in the site usage or is it calculate only based on the workbooks, extracts, data sources published to the site?
3. Are all the new folders created during the incremental refresh be included in the space calculation for the site?
Incremental Extract Processing :
1. Does tableau server automatically point to the latest incremental extract for the workbook or does it point to all the extracts created as each may contain just the newly added rows? What happens if we do a weekly refresh of the same data source, how does it consider that extract as the one containing the most up-to-date info ?
2. How does it calculate the most recent date/time or ID (the field used for capturing the incremental data). ? Is this a separate query each time sent to the data source? Or is it used in the filter clause as a sub-query ? How many queries are being sent to the data source in this case ?
3. Is there some kind of logs or some tables in Postgres DB repository where I can also find these details on what query was sent and how long it took to complete?
4. The view on Background tasks for extract does provide few details, but I need to dig deeper and investigate which step in the refresh is taking majority of the time.
Extract Maintenance & Deletion :
What are the best practices to clean up the older extract files (can be incremental or full refreshes) that get created for each schedule run. We run daily and weekly backups with cleanup command used in the script. I understand that cleanup command only clears out the temp files but how do clear out the ones created under Tableau\data\tabsvc\dataengine location when they are no longer needed?
We are going to suggest that any Incrementally refreshed extracts should be fully refreshed at regular intervals (e.g. every weekend) in order to maximize performance. In this case, how can we as admins make sure to clear out the old extract files present in the data engine folder on the worker nodes?
Any help is much appreciated. I apologize if any of my questions are not clear.