12 Replies Latest reply on Mar 15, 2013 2:06 PM by Cristian Vasile

    Estimate Extract Size

    Jeff Pressman

      Are there any rules of thumb for estimating extract size? In my case I need to create a TDE from a sql server table with 200 million rows X 10 columns and I want to make sure I've got enough disk space.

        • 1. Re: Estimate Extract Size
          Dan Huff

          It is very difficult to estimate extract size due to how Tableau may compress the information. Some columns are easy to compress and therefore end up taking up much less disk space than they would have originally. Others are more difficult to compress and can therefore use as much disk space as the original.

           

          The best is to just try and take a subset, see how large the file is, and then try and extrapolate this out. This may not be a 100% perfect way to estimate the size but it may point you in the correct direction.

           

          Also, another important thing to think about is whether or not you and your other users actually need to hit the entire data-set of 200 million rows. Are there subsets in which people are interested? If so, it may greatly reduce your need for disk space to make a smaller extract due to your users needs.

           

          Hope this helps,

           

          Dan

          2 of 2 people found this helpful
          • 2. Re: Estimate Extract Size
            Jeff Pressman

            Thanks Dan,

             

            I appreciate the response.

             

            In my case I need the entire data set. I did take a sample of 12 million rows which created a 25 MB TDE. I am now creating a full extract on the server and will be interested to see how close the TDE size is proportionally based on the sample. I did waste a couple of hours on an initial full refresh due to the default query timeout so if you are working with large extracts, don't forget to update your bakgrounder querylimit!

             

            I'll post back with an update once the full TDE is created.

            1 of 1 people found this helpful
            • 3. Re: Estimate Extract Size
              Cristian Vasile

              Jeff,

               

              I think that will be less that 5,000Mbytes, a reasonable amount of disk space in your case. I run an extract with 111Million records and 29 fields and size on disk is 3,920Mbytes.

               

              Regards,

              Cristian.

              • 4. Re: Estimate Extract Size
                Jeff Pressman

                I think you mean 500,000 kb which is a reasonable amount of disc space. Unfortunately my backgrounder process timed out 8 hours in. I can certainly increase the timeout however I was surprised given that the 12 million sample took 30 minutes over VPN connection and I ran the full extract refresh directly on the server . Admittedly the server environment and the source database are not optimized so I'll be focusing on that. Also send the logs to support to see if they have any suggestions.

                • 5. Re: Estimate Extract Size
                  Cristian Vasile

                  Jeff,

                   

                  Indeed are Mbytes not Kbytes, my fault.

                  Could you try to import in batches, if your data could be divided in chunks by date (month/year for example)? I am sure the support team will give you excellent suggestions in order to overcome the issue.

                   

                  Regards,

                  Cristian.

                  • 6. Re: Estimate Extract Size
                    Jeff Pressman

                    I ended up allocating additional ram and optimizing the sql server data source and was able to extract an 192+ million row  fact table and 5 dimensions in 1 hour and 45 minutes.

                     

                    Interestingly the resulting TDE appears to be 25MB on disk which leaves me more than a bit perplexed.

                    • 7. Re: Estimate Extract Size
                      Cristian Vasile

                      Jeff,

                       

                      25Mbytes seems way too small, are you sure that did you measure the latest TDE file? It's the same size as the one reported by you for 12Million records...

                       

                      Regards,

                      Cristian.

                      • 8. Re: Estimate Extract Size
                        Jeff Pressman

                        My thoughts exactly. I am looking at the space usage report in tableau server (v8 beta) as well as the file size reported by windows in "C:\ProgramData\Tableau\Tableau Server\data\tabsvc\dataengine\folder_<UID>\upload_{GUID}\<filename>.tde"

                         

                        When I open the view in the browser, I am seeing all data.

                        • 9. Re: Estimate Extract Size
                          Cristian Vasile

                          Jeff,

                           

                          in this case your vizzes should simply fly as an F16 with 2Mach warp speed.

                           

                          regards,

                          Cristian.

                          • 10. Re: Estimate Extract Size
                            Toby Erkson

                            Jeff, did you do any aggregation on the extract?  Extract Data... >> Filter or Aggregation or Hide All Unused Fields.  I would think that could help reduce the extract size.  I dunno, just following y'alls conversation.

                            • 11. Re: Estimate Extract Size
                              Jeff Pressman

                              Hi Toby -no, nothing that I  can see. I even went back and re published the extract to the server, added it to a new workbook and can still browse all data. It is possible that I did something different when taking the sample that effected its size but I find it amazing that a nearly 200 million row extract is compressed to 25 mb - granted the fact data is tiny ints and dates but still.

                               

                              Cristian - are you saying the vizzes should fly because the extract is small or because of v8 or some other reason?

                              • 12. Re: Estimate Extract Size
                                Cristian Vasile

                                Hello Jeff,

                                 

                                Should be generated very fast because:

                                a) small TDE means that the hard disks perform a lot less movement of physical heads, and OS could  better buffer small chunks of data.  25Mbytes file is a piece of cake for any modern OS.

                                 

                                b) V8 VizQL engine has been improved by coders, performance versus v7 in general is superior, check this link posted by Mark Jakson http://community.tableau.com/thread/123481

                                 

                                Regards,

                                Cristian.

                                1 of 1 people found this helpful