Are you able to delete redundant, unused Data Extracts? I think the answer to this query is largely administrative.
I'd also look at whether you have workbooks which are published with large extracts used across multiple workbooks - could you publish this once, and have all the workbooks connect to that?
I hope this helps reduce the size of the workbook!
Hi Ravi Mistry,
As you mentioned, there are some extracts under different extract directories on server.
File names and types of these files are the same. But size of them is different. It may be versions of an extract.
But how can I find unused Data Extracts on Tableau Server?
Thanks for your help!
Here is one of the extracts located on Tableau Server. Why do we have these versions on disk? Same file has been created at different locations and have different size each other.
There are many Tableau users have "publish workbook permission" to server.
I have searched and found "Revision History Settings" on Tableau. "Settings" > "Revision History"
I have set this to 5 instead of 25. And then also clear all revision history on Tableau.
But nothing happened.
Hi Ritesh and Ravi,
Thanks for your reply.
I am new to Tableau.
I am trying to understand the structure of Tableau and also I want to decrease the backup size.
I found the reason why extracts have too many versions.
The cause is that the data source is not published separately to the server but it is embedded in the workbook.
In our company, one of the Tableau user uses embedded datasources in workbooks instead of publishing one datasource separately and using one datasource.
I think that this causes creating new extracts on Tableau server and keeps redundant data on disk.
I told him the issue and I wanted to redesign his workbooks.
One recommendation is that if you haven't already it might help if you run the "tabadmin cleanup" command while Tableau Server is in a stopped state. It might be the ase you already have this command running as part of a nightly backup for Production (which we strongly recommend) but that will mostly cleanup the postgres database:
With Tableau Server you give users a lot of room to run but as you mention this can come back in ways in that if you had say 10 users and all created an extract for their datasource but in reality it's the same data source you now have 10 copies instead of just one. Version 10.4 comes with "certified" data sources so that may be a route to try to steer users towards using that if one is already available. Since one published data source could now cover all 10 users.
There might be a misconception that wanted to help clear up. Most extracts whether they are themselves a "published data source" or "embedded in a workbook" will be part of a "refresh schedule" where the backgrounder will then periodically refresh the extract. In this case Tableau does not save multiple instances of the extract. We only save the metadata changes of the workbook so we can revert to a prior version. If you're seeing something different it would be a good idea to open a support case. Below is a link for more information though I'd recommend looking over the "Potential Revision History Issues" section
Work with Content Revisions