1 Reply Latest reply on Oct 14, 2013 2:44 PM by Nicholas Lupis

    What is the best practice for accessing a star schema from our newly created Dimensional DW?

    Ron Chipman

      We have been partnering with our DBAs to get our transactional ODS transformed into a proper DDW so that we could create better efficiencies on the front end with our reporting application (Tableau).


      We previously had 100s of custom extracts running, each supporting a 1 or 2 report workbook answering very specific questions.  What we saw, was the opportunity to better organize our data on the back end, so that we could have proper fact tables and dimensions to be able to allow our end users to ask 100s of questions to 4 or 5 workbooks - leveraging the power of parameters, hierarchies, and the like.


      Our challenge -- now that we have our star schemas built, Tableau doesn't appear to be able to handle them.  This puzzles me, because I have to imagine that these are small beasts compared to what other organizations might be tackling.  We have tried multiple approaches:


      1)  Create a published data source on Tableau Server for the full star schema -- We wanted to go this approach so we could also build out standardized metadata and publish the data source for our 'power users' to leverage; this would allow for broader use and adoption, with a semblance of control of definitions and measures.  Our first effort, on our primary concept, but relatively small data set, yielded an extract of 200GB.  The extract would complete on our local laptops (after about 2hrs), but when we scheduled on the server, they would never finish.


      2)  Modify the published data source -- We took a spin off of #1, and in the table editor in Tableau, cut out as many dimension fields that we felt weren't mission critical, were larger unique text files, redundant ID fields, and then reduced the timeframe of the fact to rolling 3 years instead of last 6.  This reduced this first extract to around 80GB, and it seemed to work, temporarily.  We copied this approach and built additional data sources of size ranging from 80GB to around 200GB


      3)  When we upgraded the server to v8, none of these extracts worked, the DBAs told us that Tableau couldn't process that much data on an extract (Really?), so we tried to set up the data sources as LIVE connections to the SQL server.  This yielded horrible performance rates (it took one report 20 min to open)


      4)  DBAs continue to suggest that we refactor our extracts to be smaller and more custom to the appropriate area.  My hesitation with this, is this puts us back to where we started.  It is a maintenance nightmare. 


      My previous experience is with the likes of other, larger, vendors where 'universes' were the norms and helped to control maintenance.  I want to stick with Tableau and make sure we have exhausted all of our efforts, but I have to be open to the idea that this model might not work for us.


      Before we get farther down this slope, am I missing something in our options?  Is someone else leveraging large data sets (150M rows with wide dimensions) to enable a large end community to gain analysis (500+ web users)?


      Thanks a million !!


        • 1. Re: What is the best practice for accessing a star schema from our newly created Dimensional DW?
          Nicholas Lupis

          Hi Ron


          I know exactly where you are coming from. Having been trying to implement BI solutions for a number of years in very large data organisations it makes for an interesting challenge given that report performance is a key criteria.


          I am a little behind the times in that I haven’t utilised the Tableau Extracts more recently for large data sets and therefore not sure if they have improved in any way since they first came out. I have however had to work with Tableau to get the best out of a Star Schema environment and to best honest it was tricky. Tableau doesn’t really handle these structures due to the nature the large fact table and the need to join to multiple dimension tables. Tableau does like a large flat single data set which lends itself nicely to having everything in one place. The Tableau Extract being created is large as in effect it’s trying to create this super set of data in one place.

          What sort of database are you using? Have you exhausted all your database options? Compression, Indexing, Partitioning, Indexed views etc


          I think given the requirements of managing the large and wide data source may mean building smaller summarised consolidated tables specific to the area of focus. This does seem a little ridiculous in this day and age and does cause a maintenance overhead but if the primary requirement is report performance then you need this to get the best out of the chosen tool. The maintenance overhead can be minimised in this respect if you have a good ETL tool that helps manage these aspects then in effect what you are creating is a summary set of data over and above the detailed information you have. I can give you some recommendations on this if required.


          Other things to consider on the database side is your disk throughput, temp space, disk location and type and if you have fibre channel (or similar) between your infrastructure. SQL Server struggles if the temp space is not optimal also you can look at the RAM and CPU of the SQL Server box. Are you running a Virtual environment?


          In a similar scenario I had to look and test a number of different options as to which was the most suitable. This is quite time consuming and can be a pain for the users but you need to get the performance improvements to make it a successful solution.


          I checked you out in linkedin and being in Texas makes it’s a little hard for me to help you out (as I am in New Zealand) unless you would like to do a Skype or Google + session then happy to help out.





          1 of 1 people found this helpful