We are finally moving our team toward using Tableau Server data sources with automated/scheduled extract refreshes rather than embedded data sources. A great first step!
What we have realized is that we have so much data that full refreshes are getting slower and slower every day. Data is added to the source system every day and when the refresh runs it's getting one more day of information every day and just taking longer and longer.
I'm familiar with the concept in Tableau Desktop of appending data, but I haven't done such a thing for data sources on Server.
The Hadoop data is something like this (obviously simplified for this example):
record_date field1 field2 field3
I would like for the extract to be populated each day with any new data in Hadoop. The pseudocode would be something like
INSERT INTO TableauServerExtract (SELECT * FROM HadoopTable WHERE record_date > max(record_date in TableauServerExtract))
It would probably also be a good idea to do a full refresh monthly or something.
I found this, but it doesn't give much detail. I'm loosely familiar with tabcmd so if that's of use I can possibly look into that would prefer avoiding it. Can someone point me in the right direction please?
As per - Refresh Extracts
"If you’re publishing the data source to Tableau Server, you can specify the type of refresh in the Scheduling & Passwords dialog box. Most data sources support an incremental refresh."