6 Replies Latest reply on May 22, 2015 11:52 AM by harrison.milinski

    Enhancing Performance

    harrison.milinski

      Hello all,

       

      I have a workbook (attached) that is querying and putting together visualizations rather slowly.  I've tried what seem to be some common fixes: hiding unused fields, running on an extract, optimizing extract, etc. - but these have only helped marginally.  I've even switched from a flat file connection to using Hadoop Hive (any tips or best practices there are also welcome). 

       

      I've limited the observations for the purpose of attaching, but normally the workbook holds about 14 million observations, and I'd ultimately like it to have even more which is why I am focused on speeding things up. I've heard the suggestion to pre-aggregate data, but that sort of goes against the mission of this workbook.  I'd like to keep everything at the most granular level if possible.  My conjecture is that it is simply the amount of views/filters I have here combined with the complexity of some of the calculated fields I am displaying that are slowing things down.

       

      Any help or suggestions are much appreciated. Thanks!

       

      -H

        • 1. Re: Enhancing Performance
          harrison.milinski

          Any help would be much appreciated!

           

          -H

          • 2. Re: Enhancing Performance
            Toby Erkson
            1. What version of Tableau? 
            2. Are you concerned only with Desktop or is the report being viewed on Tableau Server?
            3. Have you run a performance recording?
            • 3. Re: Enhancing Performance
              harrison.milinski

              Thanks for the reply.

               

              I am running Tableau 8.3 and am currently only concerned with Desktop.

               

              Thanks for the link to performance recording. I have not tried this.  What would you recommend as a next step once I have a .twbx of the performance recording?  Post it here or elsewhere?

              • 4. Re: Enhancing Performance
                Toby Erkson

                You can post it here as there may be someone who can help out.  Do you know how to read it?

                • 5. Re: Enhancing Performance
                  Jeff Strauss

                  I popped open the workbook on 8.3 and see the slowness that you are concerned with, and have the following suggestions / recommendations.

                   

                  - You have two datasets each with 7 million rows and are trying to blend the two of them together.  There's a lot of overhead associated with doing this on 2 large datasets.  Do you really need both of them?

                   

                  - Can you add a primary context filter?  Look this up.  the high level underlying concept is that the data is filtered down from 7 million to let's say 100,000 rows (into a temp table) and then all subsequent activities (e.g. filtering, blending) hit this temp table.

                   

                  - You have a lot of calculations that have table calcs (e.g. windows_sum, total) and parameters.  These calcs are not materialized within the extract and therefore takes time to calculate as part of rendering the view.  Use caution here, but often times there's not so much you can do if this is what the biz requirements are.

                  • 6. Re: Enhancing Performance
                    harrison.milinski

                    Thank you for the feedback.

                     

                    I have removed the secondary data source.  It was only used in one of the views anyway.

                     

                    After also trying to set up a context filter (not sure if this helped or not), I recorded my performance and the culprit would appear to be the table calculations.  What I am seeing as taking the majority of the time are 4 events labeled 'computing table calculations' one from each worksheet.  Then 4 events labeled 'executing query'.

                     

                    I have attached the performance recording.  What I recorded was adjusting the gender filter to show both genders rather than just Males.

                     

                    I think my problem now becomes how to display the measures I want without having to rely on nested table calculations.  But I will leave this open for now in case anyone has another suggestion to speed up performance...

                     

                    -H

                     

                    EDIT:  I am currently exploring the potential use of Apache Drill with the goal of it reducing query time.  Anything I should look out for here? Or can I safely expect that Drill should reduce the event time of each of the 'executing query' events?