5 Replies Latest reply on Mar 26, 2015 9:51 AM by Cristian Vasile

    What factors impact extract performance?


      We have an extract-heavy Tableau Server environment and extract performance is quickly becoming an issue.  I'm looking for some details on what (if anything) can be done to boost extract refresh performance outside of the obvious answers of decreasing the number of rows and columns included in the extract and "optimizing" the extract.


      As an example, there is an Oracle source that contains 30 million rows and 250 columns.  If I run the SQL that the extract refresh is generating (sourced from the Tableau log file) in a query tool like Toad, I get results in 10 minutes.  But, the actual refresh process on Tableau Server clocks in at 104 minutes, on average.


      Can anyone who is familiar with Tableau Server's extract process explain whether it is possible to speed up the actual "import" portion?  I'm fairly certain Tableau Server gets the query results in 10 minutes but the process of "streaming" and storing the data takes an excessive amount of time.


      Worker specs:

      • Intel Xeon CPU E5-2680 v2 @ 2.80 GHz (4 processors)
      • 24 GB RAM
      • 64-bit OS
        • 1. Re: What factors impact extract performance?
          Jeff Strauss

          Adding myself as a watcher...

          • 2. Re: What factors impact extract performance?
            Cristian Vasile



            Did you monitor IO load during extract generation and temp folders used by Tableau server's processes?


            There is a thread questioning the rationale behind a RAM disk solution, there are a lot of settings posted, take a look here Would a RAM disk improve Server performance?


            I just copy/paste few comments:


            Tableau Server uses the folder :\ProgramData\Tableau\Tableau Server\data\tabsvc\temp


            Tableau Data Extracts - Tips, Tricks and Best Practices | Tableau Software

            • In a Tableau Server environment, it’s important to make sure that the backgrounder has enough disk space to store existing Tableau extracts as well as refresh them and create new ones. A good rule of thumb is the size of the disk available to the backgrounder should be two to three times the size of the extracts that are expected to be stored on it.
            • Tabcmd (a command-line utility) can be used to refresh extracts, as well as to publish TDEs to Tableau Server


            I suppose that vizqlserver.exe  and/or data engine (tdeserver64.exe) will need temporary disk space to be able to push correct data back to clients.

            The list with all tableau processes http://onlinehelp.tableausoftware.com/current/server/en-us/help.htm#processes.htm%3FTocPath%3DAdministrator%2520Guide|Troubleshooting|Work%2520with%2520Log%2520Files|_____1


            data engine



            Stores data extracts and answers queries


            The data engine's workload is generated by requests from the VizQL Server process. It is the component that loads extracts into memory and performs queries against them. Memory consumption is primarily based on the size of the data extracts being loaded. The 64-bit binary is used as the default on 64-bit operating systems, even if 32-bit Tableau Server is installed. The data engine is multi-threaded to handle multiple requests at a time. Under high load it can consume CPU, I/O, and network resources, all of which can be a performance bottleneck under load. At high load, a single instance of the data engine can consume all CPU resources to process requests.


            VizQL Server


            Loads and renders views, computes and executes queries


            Consumes noticeable resources during view loading and interactive use from a web browser. Can be CPU bound, I/O bound, or network bound. Process load can only be created by browser-based interaction. Can run out of process memory.




            Tableau stack use ~20 folders to store logs!

            Server Log File Locations


            the math equation who tell us how much free space on temp folder we need!


            Refreshing extracts

            If using extracts, consider the space needed by the Temp directory during an extract refresh. The Temp directory, which is where an extract is stored to during a refresh, may require up to the square of the final file size of the extract. For example, a 12 GB extract may take up 144 GB of disk space to complete the refresh.


            Understanding Disk Space Requirements for Tableau Server | Tableau Software


            26Gbytes of RAM are a lot of bits...

            In your case i will buy 2 good sdd disks (256 or 512 Gbytes), the prices are decent now,  put them in raid 1 (mirroring) and point temp/tmp folders to that disk.



            Hope this helps.




            • 3. Re: What factors impact extract performance?
              Cristian Vasile



              You wrote:

              If I run the SQL that the extract refresh is generating (sourced from the Tableau log file) in a query tool like Toad, I get results in 10 minutes.  But, the actual refresh process on Tableau Server clocks in at 104 minutes, on average.


              You compare apples versus oranges, what you see in TOAD are raw records versus .TDE, where data is put in columns, indexed and compressed. To create a .TDE, which is in fact a database, a lot of work is done in background, so to decrease the time needed to create that .TDE you should carefully monitor IO load, cpu and memory usage during the process.


              Other option is to offload this task to a dedicated machine or purchase an ETL software able to export data as .TDE see this thread third parties tools able to export/import data to/from Tableau Data Engine




              • 4. Re: What factors impact extract performance?

                Yes, I concede that TOAD vs. Tableau Server extract is not a one-for-one comparison.  I only throw the TOAD information out there because it verifies that Tableau Server isn't creating some horrendous, unmanageable SQL on its own and it should also be roughly the amount of time it takes Tableau Server to return the same raw data in the early stages of the extract refresh.


                Aside from decreasing the amount of data, I'm searching for what we can do to speed up the "columnarization" that Tableau is doing.  We are good on all the basics (disk space, etc.) but it would be extremely beneficial to know whether there are any hardware or configuration changes that result in tangible benefits to extract refresh performance.


                I skimmed the RAM disk thread and it sounded like the consensus was that there wasn't enough of a performance boost to justify making the change.

                • 5. Re: What factors impact extract performance?
                  Cristian Vasile

                  The issue with the RAM Disk approach is that you need a LOT of RAM.


                  Explore the SSD path. There are 3 zero costs applications able to evaluate your system and provide recommendations based on your IO load and disk access patterns.

                  - ioturbine profiler http://get.fusionio.com/ioturbine-profiler

                  - hgst profiler http://www.hgst.com/software/HGST-profiler

                  - stortrends idata tool StorTrends® StorTrends iDATA Tool | All Flash Array & Hybrid Storage


                  Hope this helps.