14 Replies Latest reply on Aug 26, 2014 6:43 AM by Mohit Agarwal Branched to a new discussion.

    Tableau star schema approach

    sarang.bhude

      Hi,

      We have existing relational model with several materialized fact views and dimensions, on huge base table. Base table has around 200k rows per day, with around 20 dimensions having avg. 5k rows.

      We have Tableau server+desktop, and planning to report on this data. What should be best suitable approach w.r.t to Tableau?

      Should we de-normalize everything, keeping just single flat table, or have reporting on existing views?

        • 1. Re: Tableau star schema approach
          kettan

          What database do you use? We use SQL Server 2012 Enterprise Edition, and it is too slow even if we query less data than you. Therefore I would (probably) go for extracts, but it all depends on performance, something you of course can test. If you do, please share your results and choice.

           

          The problem with extracts is that you loose the flexibility of star joins between fact and dimensions and as a consequence will create many extracts based on the same facts depending on which dimensions you need. But the cost of poor performance is non-negotiable and my choice therefore extracts.

           

          My hope is that Tableau Corp. adds an ability to join extracts with dimensions, because that will make data life simple, fast, and easy in many areas, such as development, administration, disk space, and extract duration. This is why I created an idea to add  Dimensionally Modeled Extracts, which clearly tops my wish list for Tableau 9.

           

          I hope more forum users will share their experience and why & when they query outside databases and when they prefer Tableau Server extracts. I have a feeling more or less all use extracts unless they have some extremely fast (analytic) database or cubes.

          2 of 2 people found this helpful
          • 2. Re: Tableau star schema approach
            Emre Arıkan

            Johan to your question why & when they query outside databases and when they prefer Tableau Server extracts;

             

            -sometimes the data engine of Tableau (which you feed with extracts) is faster than your db.

            -usually you dont need hot data, you can live with simply T-1 data. For T-1 you need only one extract per day.

            -you dont bring extra load to your prod systems by heavy queries, unless you have a seperate dwh db/appliance. You prefer pulling your data nights usually, so that you havent any extra load on your prod system during the usual peak time, i.e. 8 am-6 pm.

            -and even you have a seperate db/appliance, the data engine could perform better, since is its a "extract"

             

            Sarang abaut your entry, I would start reporting on the existing model. I would try once on the db itself, and in paralel pull the data into tableau data engine.  trying both options would not be time consuming in the Tableau environment. For the data engine choice you should additonally consider how often and when should the data extracts be triggered, since that extraction process shouldnt impact the operations in the source db and the reporting activities in the target system.

             

            Best,

            Emre.

            2 of 2 people found this helpful
            • 3. Re: Tableau star schema approach
              sarang.bhude

              @Johan,

              We use Oracle 11g EE DB. The idea of having dimensionally modeled extracts is definitely brilliant.

              I did notice the performance difference between Tableau extracts and live DB. Though I'll test it more thoroughly, but currently I'm inclining towards extracts.

               

              @Emre

              T-1 approach isn't suitable for us, as the data gets updated every 15-30 mins by our ETL jobs. So if we were to push extracts in Tableau server, maximum delay of 1 hour is expected.

               

              How do you go about simultaneously having data in db and Tableau as well? While importing tables, it gives option to connect live, import some or all data. Checking/Unchecking option to "use extract" will do that?

               

              I was thinking another approach, of creating simple views(not materialized) in DB, and pulling them in Tableau. Publishing that in Tableau should reduce the burden of importing 20+ dimensions and joining them, might also help with row filtering.

              • 4. Re: Tableau star schema approach
                kettan

                Thanks, Emre, it is very nice with feedback on this topic.

                 

                -sometimes the data engine of Tableau (which you feed with extracts) is faster than your db.

                In my case, I would change sometimes to always.

                 

                -usually you dont need hot data, you can live with simply T-1 data.
                For T-1 you need only one extract per day.

                I agree.

                 

                -you dont bring extra load to your prod systems by heavy queries, unless you have a seperate dwh db/appliance. You prefer pulling your data nights usually, so that you havent any extra load on your prod system during the usual peak time, i.e. 8 am-6 pm.

                I agree. My goal is that 0 queries will run directly on production systems. Data pulling will be daily and probably near real time for some data, such as every hour or whatever.

                 

                -and even you have a seperate db/appliance,
                the data engine could perform better, since is its a "extract"

                As previously mentioned, I would say always.

                 

                Curious

                Do you query your data warehouse directly, or do you rather move extracts to Tableau Server, or is it a mix?

                • 5. Re: Tableau star schema approach
                  Emre Arıkan

                  Johan I appreciate your feedback.

                   

                  We use rather SAP BO as reporting. We are evaluating Tableau as a data discovery tool nowadays. And for that purpose we havent decided yet how will the Tableau be feeded. Since in our use case the tables are huge, preparing datamarts and feed it to the Tableau server based on the data discovery request is an option.

                  In order to accelarete the work done on tableau, using ssd discs is a possibility which we will study.

                   

                  Sometimes we will be querying the dwh(we have teradata there) online for the big data-sets . Teradata corp. is suggesting its in memory technology ("intelligent memory") there.

                   

                  When its abaut data discovery, there must be a combined hw-sw solution. Tableau is sadly a sw solution. Thats why all that considerations..

                   

                  Best,

                  Emre.

                  • 6. Re: Tableau star schema approach
                    Emre Arıkan

                    @Sarang

                    its a sporty data load plan, that 1 hour delay. and the fact table takes real many transactions. You plan to extract the data incrementally I guess.

                     

                    The problem in your case is I think you have an ordinary db as dwh. Having that and on the other side  nearly online reporting is needed! its a bit contradictory.

                     

                    While importing tables, "import some data" option is I think for development purposes, not suitable for incremental data load. You develop and test your application with some data faster. For the final version you just substitute that partial data source with a similar data source but with entire data.


                    Best,

                    Emre.

                    • 7. Re: Re: Tableau star schema approach
                      kettan

                      Thanks Emre, this is nice to know.

                       

                      I am neither familiar with SAP BO nor Teradata, but guess Teradata is a better choice than Tableau extracts. My comment was limited to relational databases such as Microsoft SQL Server.

                      • 8. Re: Re: Tableau star schema approach
                        Emre Arıkan

                        thank you too Johan, which technologies should be used are always use-case dependent. everbody has it just different.

                         

                        Best,

                        Emre.

                        • 9. Re: Tableau star schema approach
                          kettan

                          Database Technology vs Price

                          I agree and would like to add that 'the price of technology' available is also a deciding factor.

                          If Teradata and its competitors have good offers to smaller companies, then I would seriously consider moving to a fast database.

                           

                          In-memory technology in MS SQL Server 2014

                          On the other hand, maybe MS SQL will perform better than Tableau extracts when we upgrade to MS SQL 2014 with its new Hekaton in-memory technology. If so, I will likely stop using Tableau extracts and be released (and relieved) from all the extra work that creates.

                           

                          Dimensionally Modeled Extracts

                          And when Tableau hopefully implements  Dimensionally Modeled Extracts  I will probably return back to Tableau extracts. If I only knew if this is on the radar for Tableau 9 or 10, then I would rather wait and use my time for better purposes.

                          • 10. Re: Re: Tableau star schema approach
                            Emre Arıkan

                            "MS SQL 2014 with its new Hekaton in-memory technology" sounds good . together with tableau for companies having a clearly arranged small-size data-model, it would be a good option. I assume, as it is in teradata, SQL server will decide itself where to go, to db or memory namely.

                            • 11. Re: Re: Tableau star schema approach
                              kettan

                              Emre, thanks for feedback.

                               

                              Questions related to data warehouses, star joins and alike are too often ignored in this community. I assume it is because database workers are a small minority on this forum. I have been "following" (email subscribing) this question since asked August 28, and chose to bump it with just any answer yesterday, knowing it had been ignored for more than a week.

                               

                              That said, I don't think  Sarang  has got a satisfying answer yet to the question (but it is not an easy question to answer, because 'it all depends'):

                              Should we de-normalize everything, keeping just single flat table, or have reporting on existing views?

                              If he has, I hope he marks it as correct answer, and also those helpful which are helpful in getting closer to the answer.

                               

                              Regarding your comment ...

                              I assume, as it is in teradata, SQL server will decide itself where to go, to db or memory namely.

                              ... I sadly have to say, that SQL2014 doesn't decide itself, but is to be manually defined:

                              The way Hekaton works is that you declare a table as memory-optimised, following which SQL Server will load the entire table into memory.   Microsoft SQL Server 14 man: 'Nothing stops a Hekaton transaction' • The Register

                              2 of 2 people found this helpful
                              • 12. Re: Re: Tableau star schema approach
                                sarang.bhude

                                Thanks for all your responses. I spent past week working on the different approaches suggested. While a dedicated middleware like talend isn't really preferable, we were hoping to reuse existing tables and views(pulled in tableau extracts).

                                But existing base was fairly vast, and difficult to integrate fully. We have decided but pick it from somewhere in the middle, pulling essential dimensions in the beginning and keeping user friendly (from perspective of ad-hoc reporting)

                                Further enhancement can be made as we go.

                                 

                                I will mark this as answered now, with some helpful responses.

                                 

                                Thanks,

                                Sarang

                                • 13. Re: Tableau star schema approach
                                  kettan

                                  I wonder if data blending would work as a good enough star join !?

                                   

                                  If yes, we could store dimension tables as separate extracts on Tableau Server and use them as secondary sources in our fact extracts.

                                  • 14. Re: Tableau star schema approach
                                    Mohit Agarwal

                                    Blending won't help you. Blending is used to add extra columns to your data set, but you can't group at a higher level than the column being blended on.