We have been partnering with our DBAs to get our transactional ODS transformed into a proper DDW so that we could create better efficiencies on the front end with our reporting application (Tableau).
We previously had 100s of custom extracts running, each supporting a 1 or 2 report workbook answering very specific questions. What we saw, was the opportunity to better organize our data on the back end, so that we could have proper fact tables and dimensions to be able to allow our end users to ask 100s of questions to 4 or 5 workbooks - leveraging the power of parameters, hierarchies, and the like.
Our challenge -- now that we have our star schemas built, Tableau doesn't appear to be able to handle them. This puzzles me, because I have to imagine that these are small beasts compared to what other organizations might be tackling. We have tried multiple approaches:
1) Create a published data source on Tableau Server for the full star schema -- We wanted to go this approach so we could also build out standardized metadata and publish the data source for our 'power users' to leverage; this would allow for broader use and adoption, with a semblance of control of definitions and measures. Our first effort, on our primary concept, but relatively small data set, yielded an extract of 200GB. The extract would complete on our local laptops (after about 2hrs), but when we scheduled on the server, they would never finish.
2) Modify the published data source -- We took a spin off of #1, and in the table editor in Tableau, cut out as many dimension fields that we felt weren't mission critical, were larger unique text files, redundant ID fields, and then reduced the timeframe of the fact to rolling 3 years instead of last 6. This reduced this first extract to around 80GB, and it seemed to work, temporarily. We copied this approach and built additional data sources of size ranging from 80GB to around 200GB
3) When we upgraded the server to v8, none of these extracts worked, the DBAs told us that Tableau couldn't process that much data on an extract (Really?), so we tried to set up the data sources as LIVE connections to the SQL server. This yielded horrible performance rates (it took one report 20 min to open)
4) DBAs continue to suggest that we refactor our extracts to be smaller and more custom to the appropriate area. My hesitation with this, is this puts us back to where we started. It is a maintenance nightmare.
My previous experience is with the likes of other, larger, vendors where 'universes' were the norms and helped to control maintenance. I want to stick with Tableau and make sure we have exhausted all of our efforts, but I have to be open to the idea that this model might not work for us.
Before we get farther down this slope, am I missing something in our options? Is someone else leveraging large data sets (150M rows with wide dimensions) to enable a large end community to gain analysis (500+ web users)?
Thanks a million !!