10 Replies Latest reply on Nov 29, 2017 9:19 AM by niall.ridge

    Refresh of server extracts slow from distant SQL Server connection

    niall.ridge

      Apologies if this already answered elsewhere. I looked but couldn’t find advice on what I suspect is a common scenario.

       

      We have a SQL Server based data warehouse located in London, and Tableau Server on an AWS cluster in the US near corporate HQ (5 servers, 32 core). The problem we have is that overnight refreshes seem to be taking a disproportionate amount of time, and sometimes time out. To describe the process and what we’ve tested so far:

      1. The source in SQL Server is a static table (no joins, filters etc). It contains about 11 million rows and 33 columns, taking about 4GB of space in SQL Server.
      2. Creating an extract through Tableau Desktop on our London network takes about 4 minutes. Publishing this extract to Tableau Server takes about 8 minutes. Refreshing this extract from Tableau Server takes over _60_ minutes
      3. Reading the data directly from the database and writing the results directly to Tableau Server through Alteryx takes about 12 minutes
      4. Extracting the data through a SQL client takes about 4 minutes
      5. Creating an extract from Tableau Desktop in the US takes over _60_ minutes
      6. Exporting the data using BCP to a flat file on a London server takes 1.5 minutes, compressing and then transmitting this compressed file from UK to US takes 2 minutes
      7. If we refresh the Tableau Server extract from an AWS SQL Server to Tableau Server, it takes only 3 minutes

       

      Based on the above, it doesn’t seem that the raw query execution on SQL Server (can occur in 4 minutes) network bandwidth (transmission can occur in <10 minutes), or creation of Tableau extract (can occur in 3-4 minutes on desktop or server) are the bottleneck. The best guess we have is that this is related to the verbosity/’chattiness’ of the SQL transmission protocol (TDS?) over TCP.

       

      Can anyone confirm the theory about protocol as a bottleneck, or advise on further tests that would help narrow this down? If this is a known limitation, is there a best practice for optimising replication of data from remote SQL sources?  It might also be equally informative if anyone can confirm that they've got a similar set up that doesn't have any issues, since this may point to an issue with our server or network configuration.

       

      Thanks in advance!

        • 1. Re: Refresh of server extracts slow from distant SQL Server connection
          Tom W

          If you ran a SELECT * FROM static table in the US via SSMS I'm guessing it will take an hour to return the full result set as well. The way sqlserver streams this data to the client application (either SSMS or Tableau etc) is definitely slower than a BCP or an alternative.

          I'm not really sure if there's anything easy you can do to speed this up outside of moving the server or creating a process to manually extract the data in London and push it across to the US somehow.

           

          Is there any reason your DW doesn't sit in the same location as your Tableau Server?

          • 2. Re: Refresh of server extracts slow from distant SQL Server connection
            niall.ridge

            Thanks, Tom. The UK and US offices have historically been quite separate, and the DW also feeds into a local (legacy) BI tool that's still used for a lot of pixel perfect management reporting, so we won't be able to move it in a hurry. We're looking into some options for scripting up a transfer via FTP or another suitable protocol, but hoping someone with a similar challenge has advice on what does/doesn't work well with the tableau server ecosystem.

            • 3. Re: Refresh of server extracts slow from distant SQL Server connection
              Tom W

              Thanks for the context.

               

              If it were me, I'd consider myself as having the following solves;

              • Mirror the DW using replication in the US so you have the same DW in both locations. This is probably the most comprehensive solve but also the most extreme.
              • Establish a trimmed down version of the DW in the US or a "data mart" which only contains the pieces you need for US based reporting > transfer your data using some type of a script, FTP etc where it's compressed and optimized for transfer.
              • Move the Tableau sever to the UK
              • Script a process which generates a TDE file using one of the SDK options and run this script in London, then publish the TDE across the pond to the US.
              1 of 1 people found this helpful
              • 4. Re: Refresh of server extracts slow from distant SQL Server connection
                niall.ridge

                Thanks Tom, great to have confirmation of the source of slowness. I think the first or second options are the most suitable for our situation.

                With the last option, this seems to have a few disadvantages (at least when using Alteryx, with my limited knowledge) - particularly that you can't embed dynamic calculations required for non-aggregating ratios etc, and you can't specify when numeric fields are actually dimensions and not metrics. This means that some work is required before the data source is as user friendly as I'd like for report creators.

                • 5. Re: Refresh of server extracts slow from distant SQL Server connection
                  Tom W

                  If you wanted to go down the extract path, note that the metadata you're

                  talking about is defined in the connection / TDS file.

                  http://onlinehelp.tableau.com/current/pro/desktop/en-us/export_connection.html

                   

                  You could in theory create the extract once, define the calcs and organize

                  the connection then export that connection file. So long as the extract

                  resides in the same location each time, the end users would be opening the

                  saved connection and not the extract directly.

                   

                  On Mon, Nov 6, 2017 at 5:15 AM, niall.ridge <tableaucommunity@tableau.com>

                  • 6. Re: Refresh of server extracts slow from distant SQL Server connection
                    niall.ridge

                    Thanks again Tom.

                    You could in theory create the extract once, define the calcs and organize

                    the connection then export that connection file. So long as the extract

                    resides in the same location each time, the end users would be opening the

                    saved connection and not the extract directly.

                    I don't quite understand this suggestion but it sounds promising. Maybe an example will help me to understand better. Let's assume:

                    1. I create an Alteryx output file called 'out.tde' on a UK share drive with all required static fields 
                    2. I open a new workbook, add out.tde as a live data source and add calculations, set the default description, aggregation and number format, and reclassify some fields as dimensions (e.g. date field with format like 20160409 is picked up as a measure).
                    3. I save this as out_formatted.twb
                    4. I then connect to the server and publish this data source with 'include external files' selected, and no refresh schedule.

                     

                    This scenario doesn't include any automated way of refreshing. How would I best amend this to do so? Are you suggesting I should schedule the alteryx flow in step 1, and add a refresh schedule to step 4? (assuming the benefit here is that refresh from TDE from UK-US is going to be quicker than SQL connection)

                     

                    Thanks again!

                    • 7. Re: Refresh of server extracts slow from distant SQL Server connection
                      Tom W

                      I can't speak for Altryx. What I'm suggesting involves exporting the

                      connection. Take a look at that in the Tableau help.

                      • 8. Re: Refresh of server extracts slow from distant SQL Server connection
                        niall.ridge

                        While also looking at longer term solutions of bringing the UK data warehouse and tableau server closer together, we're still wrestling with a short to medium term solution for automatically refreshing to Tableau Server. As mentioned in my previous notes, I can use Alteryx Scheduler to automatically (and quickly) generate a basic TDE on a local drive or the Tableau Server.  I have a Tableau workbook pointing to this generated TDE, which adds all required metadata (calculations, hierarchies, default comments and aggregation etc). However, I want a way of publishing this improved version as a data source.  As a one-off, I can use Tableau Desktop to publish it, but is there a way to automate this?  I know TabCmd can be used to perform many tasks including publishing, but has anyone seen it used for a case like this?

                         

                        I've followed the link helpfully provided by Tom, but can't see how I'd use this without pointing the connection back to the TDE on a UK share drive.

                         

                        I've also posted this on the Alteryx forums (Scheduling a Tableau Server Data Source with metad... - Alteryx Community ), but if anyone has ideas for a solution based on the Tableau toolset, they would be gratefully received!

                        • 9. Re: Refresh of server extracts slow from distant SQL Server connection
                          Tom W

                          Here's what I think you need to do:

                          1. Create your connection in Tableau Desktop in London, add in your calculations, rename fields etc. Create an extract on the source.
                          2. Publish this source to the server (right click, publish to server).
                          3. Create reports against the source published to the server; The connection will now look like this:

                          To refresh it ongoing:

                          • First, open your Tableau server and download a copy of the source. It will save like TestSource.tdsx. Keep this file in London, it's now your 'master' connection file, you only need to do this once.
                          • If you open this tdsx file using a ZIP program you'll note you have a TDS file (where your schema and calcs are defined) and a tde file contained within the data\extracts subfolder.

                           

                          This is where you setup your automatic process; it should replace the tde in the zip file and rezip it (making sure to keep the extension tdsx) then use tabcmd to publish the connection back to the server and overwrite the existing one.

                          In theory you could have a folder on your network share or something in London called tdsxsourcefiles; you could unzip the TDSX into this folder and setup your process to replace the TDE in the subdir, zip the folder,rename it correctly, push it up using tabcmd. Any time you want to make changes to the Schema / Calcs etc, you just make the changes in Tableau desktop by opening the TDS file.

                          1 of 1 people found this helpful
                          • 10. Re: Refresh of server extracts slow from distant SQL Server connection
                            niall.ridge

                            Brilliant. I'd seen the TDE in the zip file, but you've filled in the gaps for me on the automation side. I'm about to be out of office for a few weeks but will test when back. Thanks!