Hey Glen -
Tens of millions of rows shouldn't really be an issue for us - we have plenty of folks creating extracts that are hundreds of millions of rows and some into the 1-2 billion range (although that's extreme).
There are two phases of creating an extract (an over simplification, but whatever):
- An "Executing Query" phase - your RDBMS is executing the query it needs to and is streaming rows/columns back to us, where we store them as temporary files
- A "Tableau does real work" phase, where we take those temp files, sort and compress them, and then stitch them into an extract
Do you have a feel for when you're failing? If you duplicate this process in Tableau Server, which "phase" is taking the longest? If it's simply taking your database a long time to finish doing work (not uncommon), you can simply bump up the timeout setting in Tableau Server itself.
It sounds like you're looking for Change Data Capture - The extract API won't help you here. While the 8.1 API will allow you to create a "base" extract and then APPEND new rows to it, that won't really help you if "sales" for "record 8" changed from $10 to $5, right?
That's correct. I don't have a feel for where in the extract it's failing except that it started timing out a few weeks ago. The table is growing a couple hundred thousand rows daily and extract times have been steadily increasing as the table's been growing. Extending the timeout setting would be a last resort given that this isn't the only extract activity going on, and the server where the source table lives has a lot going on with it as well (source data pulls, fact table loading, etc.). If anything, that activity would only increase as our data needs grow.
What's really needed is indeed a change data capture option that's not an insert-only incremental feature, which would greatly reduce the number of rows that need to be pulled into the extract. It seems to be a waste to pull the entire table every time when only a few hundred thousand rows have either been inserted or had some values changed.
The only thing I could think of at this point is to perhaps figure a mechanism to copy a base .tde file which contains older static data into the primary directory and then do an incremental extract on top of that, then repeat as necessary. This would shorten the incremental load time I suppose. But that requires hands-on management periodically to reset a larger and larger base file...
Yup - a CDC mechanism in the extract engine would be nice -- but building something like that "right" takes a lot of work.
For better or for worse, your scenario is a bit advanced and something that many (most?) of our users will probably never bump into...I'd suggest you search the Ideas forum for a request that looks similar to what you need (I'm sure there's one in there, I hear this question now and then) and vote it up.
The problem with your idea is this:
Original Extract has two rows:
...Over time, someone comes in and changes Record1 to $3 and adds Record3 at $8
Your "change" extract is:
If you lump these two datasets together, your Record1 value is now wrong - $13 when aggregated, not $3. You'd need to load a value in your change dataset like -$7 for Record1-- a delta....and then hope users don't inadvertently do something silly which prevents the $10 - 7 = $3 math from occuring correctly. Lots of ways this could go wrong.
Ultimately, you'd be modeling slowly changing facts - which is what DWs are for, not in-memory caches You'd need to have a mechansim for flagging which fact is the latest/greatest and then setup fllters in Tableau to ignore the other noise. I guess this could be done, but it would be somewhat unnatural...