OK...did some more experimentation with ProcessMon pointed at the Tableau processes while I loaded a saved sheet which used an extract as a data source.
Private Bytes didn't jump significantly for either process when the sheet opened, unlike PowerPivot or QlikView which grab lots of RAM immediately. Each time I executed a query going by hitting "Run Update" (I turned off Update Worksheet), I saw tdeserver64.exe go after my *.tde file.
So, it an extract is more like a local device/database (with tdeserver64 as the database engine) vs. an in-memory device which can be saved to disk (like PowerPivot and QlikView do). Can someone confirm? I think I had the wrong set of expectations around what an extract actually does - looks like it's local storage, period?
Here is my understanding.
The Data Engine is very much a database: a columnar, read-optimized database. It does not pre-load data into memory, nor does it require that your data fit into memory. Data is cached in memory as it it used. (I think the mechanism is memory-mapped I/O at the OS level and I'm not sure where those bytes show up in ProxExp.) I'm not exactly sure what "local storage, period" means - it's a database, and it's both in memory and on disk... perhaps the lines start to blur there.
Extract creation time is something that we've been working on lately, I believe.
"local storage, period" = we're essentially dealing with a local, file-based database vs. a column-based, compressed, in-memory database like some vendors are rolling out.
Once data has been cached, do you happen to know under what circumstances Desktop would drop it from memory on the client? It might be interesting to play around with some sort of poor man's pre-caching mechanism, like running a "monster" sheet which pulls the majority of attributes & measures that one uses into a sheet up front. I just don't know doing so would be worthwhile based on when/how the product cleans the cache.
Pre-loading all the data into RAM is not a critical scenario for me, I'm just exploring at this point :)
James is right. While attending the European Customer Conference today and yesterday, I saw this is solved completely in release 6.1. What a performance boost!
James, can you just clarify what's being talked about here? Is the current (6.0) extract an in-memory, columnar DB or is 6.1 going to bring more of an in-memory approach with the extracts?
Have you seen this comment:
He describes Tableau's approach with their Data Engine as "memory-biased" instead of "in-memory".
There is also a presentation at http://conference.tableausoftware.com/2011/eu/materials/tableau/DataEngine.zip that says:
"Many “in-memory” systems require all of the data to be memory resident. Data Engine is designed to take advantage when data is in-memory – but it doesn’t require that all data be loaded into memory or fit in memory. This means faster document load times, and very large databases can be analyzed (e.g., many GBs of data)"
Cool, thanks for those Joe. I remembered reading some of the words at various points as 6 was being released.
This helps clear things up.