5 Replies Latest reply on Aug 9, 2017 1:09 PM by Paulo Dantas

    Need a clarification on the Extract size

    Baskar Subbian


      Hi All,

      One of our Tableau deployment we supposed to create large (Really large) extracts (2 Billion Records). Comes around 6GB + size because of the slow underlying database. Is Tableau is capable of handling such huge extracts, If so what could be the implications in the performance.

       

      We tried to create an extract but the process runs for more than 24 hours and not able to finish. So we terminate the process. If Tableau can handle such huge extracts what could be the sizing requirements of the hardware interms of (Disk (Do we really need a SSD in this case), Cores, RAM).

       

      Our current configuration I have mentioned below.

       

      8- Processors

      40 GB RAM

      We are not using any Flash drives, or SSD for our disk.

       

      Also looking for idea's from the experts the better way to create such huge extracts.

       

      Thanks,

      Baskar.

        • 1. Re: Need a clarification on the Extract size
          panjala Srinath

          Hi Baskar Subbian

           

          I have similar situation where the extract size was around 6 GB.Your hardware specification looks pretty good,

           

          Tableau can handle large volumes of data, however it is must to consider few things while building whole analysis.

           

          Instead of giving the pointers here , i will be attaching a PDF document that will help you in this.

           

          Regards

          Srinath

          1 of 1 people found this helpful
          • 2. Re: Need a clarification on the Extract size
            Baskar Subbian

            Hi Srinath,

            Thanks for your response. My problem is the extraction of my data. My Dashboards and reports works fine with the subset of data. I am not able to create the extract for my whole data so that to test the performance of my reports with whole data.

             

            I have 3 years of data in my DB. When I try to do a full refresh, CPU is spiking at my DB server and extract is not responding. And my DB server has same 8 Cores and 48 GB of RAM.

             

            Is there any way that I can incrementally extract quarter wise. I tried first to extract the first quarter data for one year, then when I tried the incremental refresh it's try to fetch the remaining whole data as one chunk. When I try to restrict through the range filter for the next quarter, it's consider it as a full refresh and extracting the two quarters again. In which the previous quarter (Q1) I have already extracted.

             

            Any suggestion will be a great help.

             

            Thanks,

            Baskar.

            • 3. Re: Need a clarification on the Extract size
              Dan Cory

              You can create a data source for each quarter's worth of data. Use "Append from Data Source" to add each quarter at a time.

               

              You didn't mention what the underlying database was. We may be able to help further if we know that.

               

              Dan

              • 4. Re: Need a clarification on the Extract size
                Ivan Monnier

                Hi Baskar,

                 

                I am currently using/generating larger extracts, ranging from 15 to 40 GB.

                 

                Of course, extract update/generation takes hours, but they are done at night or duing the week-end.

                Our server is not far from yours I think (8 core, 64 GB RAM, 2.5 TB RAID HD, no SSD yet).

                We DO NOT generate extracts on ou desktops or laptops.

                We generate empty extracts with Tableau Desktop and update them on the server.

                Easy Empty Local Extracts | Tableau Software

                 

                Here is what I have done to manage such extracts while waiting for another architecture (I am in a very large and NOT flxible company).

                 

                • Be sure that the temp files are located on the largest drive, I have saturated the system drive in the beginning (edit the environment variables)
                • Increase the backgrouder query limit, timeout and extract loss parameters (I have set a timeout of 10 hours, default is 2 hours), durations are in seconds :
                  • tabadmin set backgrounder.querylimit 36000
                  • tabadmin set backgrounder.timeout_length_in_seconds 36000
                  • tabadmin set extract.lost.holdoff 36300
                    • I have read that it is recommended to put 5 min more than the backgrounder.querylimit
                  • tabadmin configure

                 

                The speed at which your database is sending the data is very important. The timeout is for the whole process, getting the data and compressing in a columnar format. If it takes 6 hours to read your 2B rows, Tableau Server will only have 4 hours to compress them, which may no be enough.

                One think we did was to upgarde the network connection to 1 Gb/s between the database server and Tableau Server.

                 

                I have one question though, my problem was that extract updates failed because of timeout. You did not mention timeout. If I did not get your problem, I apologise.

                 

                I hope this will help.

                 

                Best regards

                 

                Ivan Monnier

                • 5. Re: Need a clarification on the Extract size
                  Paulo Dantas

                  Hi Baskar,

                   

                  your question was answered?

                   

                  If yes, choice one question like correct.

                   

                  Cheers.