2 Replies Latest reply on Nov 28, 2016 8:17 PM by Mark Wu

    Database For Tableau

    Larsen Rennie

      Hello

       

      Currently we are using sql server as a database and soon we will be moving to aws . So we are in the process for selecting database between Redshift and Snowflake . So just wanted to know which database  has better performance in terms of analyzing Petabytes of Data by connecting to Tableau.

       

      Currently we have three database in all the Database Server(Sql server, Amazon Redshift. Snowflake). Each database has same data with same tables. The only diffrence in the three database is the amount of database present. I have been assigned to build Tableau Report First by connecting to Sql Server and then use the same report for all the three database to  get the performance metrics Data by doing Performance Recording in Tableau. Once we will get the metrics for sql server then we will test the same report against all the three database but this time with Amazon Redshift and then after this same thing gets repeated for Snowflake . So the basic funda of this is to get the performance metrics of Tableau Reports  from all the three database server  and then decide which one to go for

       

      So my questions is that database team has geenrated data with the help of some tools and loaded that data into the tables in the database. They are doing Perfromance Testing based on the sql queries generated by tool and checing their execution time by running those queries at the database level like for example by running the quesries in SSMS (for sql server) and other server

       

      1)Should i connect live with the database or extract or should i publish the datasource into the Tableau Server ?

      2) How can check at the Tableau level how the underlying database performance is good ?

      3) if i build a Tableau Report by connecting with the database server , then how will i determine that this particular  server is giving faster results for executing queries . Do i consider building some special types of Visualizations  from which i can determine how database performance is

       

       

       

       

       

       

       

       

       

      Thanks

      Larsen

        • 1. Re: Database For Tableau

          Hey Larsen,

           

          In my opinion, this is how I would test performance:

           

          1. Publish the data sources to Tableau Server. Ideally you're going to want people to connect to centralized data sources instead of all making native connections from Tableau Desktop. Also, while extracts are likely a good way to go further down the road, they aren't going to help you test database performance.

           

          2 & 3. Use performance recording and our new tool "Scout" to determine how the database is performing and if certain actions/queries are slower or faster on each server:

           

          Create a Performance Recording

           

          Getting Started: Scout

          • 2. Re: Database For Tableau
            Mark Wu

            I want to offer some answers too since I am learning a lot recently about Tableau server performance.

            1. Live connection vs extract vs published data source? Preferred option is to use published data source so you have a few data experts who have access to Database Servers and  have in-depth understanding about the table structure, data definition, etc. Those are called data stewards who make connections, create calculations, etc and then make the data connections available on Tableau server as Published Data Sources for other Workbook Developers to consume.  Pls see the chart below. This way you can achieve reusable, SSoT, etc and you do not have to train a lot of people for in-depth data knowledge. Also it reduces queries to your Data Servers. Screen Shot 2016-11-28 at 7.52.24 PM.png

            It depends on data type, volume and complexity of joins, Published Data Sources can be extracts or just live connections. Extracts will give you better Tableau view performance but live connections can give you most recent data, can create initial SQL, can leverage database built-in security if you have.

             

            2. How to check underlying db performance? Other than performance recording mentioned by Diego. You can also track how long the extracts took overtime (if you use extracts). You can also track avg query waiting time, avg query executive time, query volumes for all queries coming from Tableau server (you can track this from DB server).

             

            3. What is fast enough Tableau view performance? For individual workbook view, performance recording is a good way to go. However as a Tableau serve admin, you can create a workbook to tell what is the overall view performance on your server. My rule of thumb is 2-10-20 second - any views < 2 seconds are good; views > 10 seconds are not good; views > 20 seconds should not be published. How to find out your view performance? Tableau view elapsed_time is ([Completed At]-[Created At]) from _http_requests. You can have a following join to get a nice workbook to check out your view performance :

            Screen Shot 2016-11-28 at 8.15.14 PM.png