It's an argument that can go on forever
I'd tackle it in a few ways:
1) disk is cheap (relatively), so who cares ?
2) Dashboards run faster when the extract is smaller. I've seen a dashboard running slowly on 450mill rows (that covered about 5 yrs of data), they created ANOTHER extract for this year only (250mill rows) and the dashboard ran a lot quicker .... but you now have 2 extracts, not 1.
3) Test 10.5 and the Hyper engine. In theory your extracts should be smaller and faster using Hyper.
While I agree with you that disk is cheap, so who cares, I do have some concerns:
1) Many of these large extracts are updated daily and are full refresh, so the bigger they are the more time they take to refresh
2) I wave watched the size of these extracts greatly increase in size as the version of Tableau Server changes.
For example, they were about 5G in size when we were on version 9. Now that we are on version 10.3 they are 11G in size. While I am sure that we have increased our business a bit over the years, I am sure we did not double it, and I have not changed the parameters of the data extracts. Since they are all full refresh and not incremental refresh, with fixed date ranges, I can only assume that Tableau Server updates have added to the size. For example, I have two servers. A production and a dev server. Last week I took our dev server from 10.1.3 to 10.3.0. The exact same extract on version 10.1.3 is 10G in size. On 10.3 it is 11 G in size. The only difference is the Tableau Server version. I would like to understand why that is. Also, I would like to rebuild some of these extracts, making them smaller, if I can get some statistics on the actual data records being used on any regular basis. Hence, the question.
Don't worry, I understand where you're going but Hyper is coming in 10.5 and the "rules" will change completely.
Chris, may I ask an addition (related) question? I created a data extract and saved it as a packaged workbook. The total size for this packaged workbook is 3.69G. However, when I published the data extract to the server for user access, the disk space used on the server (and the related extract size in the data\tabsvc\dataengine folder) is 8.45G. Why the doubling in size here? I am seeing this difference across the board for my extracts and the difference seems to be getting bigger with every update. Is this due to changes youa re making in the product to prepare for the 10.5 Hyper change or is it something else?
Hi, I'd raise a ticket with Support. I don't look at Server that precisely, but if you're not using 10.5 then there wouldn't be anything about Hyper involved there.