4 Replies Latest reply on Jan 30, 2013 4:36 AM by Russell Christopher

    Optimize Tableau for Millions of Rows in SQL Server

    brian.comeau

      Hi All,

       

      I currently am running my database on MS SQL 2008 R2 in an EC2 instance on AWS (large instance). I have roughly 30 million rows of data across 4 dimensions, and I am relating these (mostly on integer or date fields) to 3 or 4 other tables that hold more detailed data.

       

      When I load this as a query into Tableau, every refresh takes 3-4 minutes (e.g. filter by country, the list takes that long to generate). That is unsustainable, so I've started to look at extracts. That said, whenever I refresh the extract, it takes a couple of hours.

       

      Are there any standards for how I should be building these tables? I've heard of indexes and things, but I was under the impression that an integer acts as an index.

       

      Any ideas?

       

      Thanks,

      Brian

        • 1. Re: Optimize Tableau for Millions of Rows in SQL Server
          Andrew Watson

          Hi Brian

           

          To begin I recommend you use an extract. Tableau performance is far better with extracts as Tableau 'self-optimises' these and I believe it's a columnar database, which is far quicker for data retrieval than relational.

           

          You mention that refreshing the extract is very time consuming. I suggest this is down to the underlying SQL query. You can read this post on speeding up Tableau extract refreshes. This post about speeding up Tableau performance may also be of use as it contains information on extracts and SQL as a data source.

           

          To summarise, I recommend you use an extract and get the underlying SQL query optimised as much as possible.

           

          Andrew

          • 2. Re: Optimize Tableau for Millions of Rows in SQL Server
            Russell Christopher

            +1 on Andrew's response.

             

            It sounds like your issue has less to do with Tableau (which shows a symptom of the problem) and more to do with SQL Server, which is the problem. SQL Server definitely does not add indexes to itself automatically - creating appropriate indexes is the job of the database administrator.

             

            If you are joining 4-5+ tables together across any non-trivial database using Tableau, AND that database does not have indexes in place to help the database query optimizer do it's business, then your query will be slow. All of your join columns should be indexed, at least.

             

            If I had to guess, I'd posit that the following issues are in play:

             

            • Lack of indexes on your SQL Server tables causes "Table Scans" when you ask a question
            • Table Scans are hitting your SQL disk drives really hard (high IO).The IO throughput on AWS is generally poor, so by pounding the disk, you unintentionally make performance even worse.

             

            If you have a DBA you can call, I'd ask him/her to index the database appropriately for the type of joins you're creating. If you don't, then go with an Extract, knowing that extract creation will take a long time for the exact same reason you want to use an extract in the first place - your SQL Server performance is no good

             

            Hope this helps

            1 of 1 people found this helpful
            • 3. Re: Optimize Tableau for Millions of Rows in SQL Server
              brian.comeau

              Hi Andrew,

               

              Thanks for the reply. It would seem that I need someone who actually understands the 'data' side of things to help out with this one. Will find the time to read up on adding indexes, as that might speed up my extracts. For the moment, I'll get used to the overnight run.

               

              Thanks,

              Brian

              • 4. Re: Optimize Tableau for Millions of Rows in SQL Server
                Russell Christopher

                Yeah, that's the beauty of Tableau - even if you're dealing with a beast of a database, you can be really productive once you get the data out