8 Replies Latest reply on Dec 8, 2016 4:33 AM by Vedant Chaturvedi

    Performance impact after joining 2 files

    Vedant Chaturvedi

      Hi All,

       

      We are facing performance issues after joining 2 text files and using "Extract" option.

      There is a significant difference in the performance numbers with and without the join.

       

      Any explanations as to why the performance is being impacted even after we have created an extract?

      Our understanding is that once extract is created, it shouldn't matter. Please correct if this is wrong.

       

      Thanks!

      Vedant

        • 1. Re: Performance impact after joining 2 files
          Chris Dickson

          Hi Vedant,

           

          an extract can improve the performance of a slow dataset that is true, however there are a number of factors to take into account for performance.

           

          One question that quickly comes to mind based on your post is -

          Is your dataset faster as a live connection to these two text files, or are you only comparing extract without join vs extract with join?

           

          Things to take into account

          1. How big are these text files, over a certain size tableau extracts do slow down, but this is in the magnitude of GB so unless your text files are gigantic this is unlikely a problem.

          2. What sort of join are you doing? sometimes doing a left or inner join on a one to many relationship can take a performant large data set and make it a very unperformant extract. i.e. i have 2 text files the first with 20 columns and a million rows, it performs fine as an extract, i need to bring in 1 extra dimension so join to a table but the join is 1 to many so my joined data set is 21 columns but is now 9 million rows and very unperformant.

           

          Sometimes if you are doing the join not to bring in a dimension but to add in a new measure you may want to consider not joining but blending instead, it can be faster as it will aggregate the second dataset as it bring the data across meaning it wont impact the performance of the overall viz as much as creating many duplicate rows through a left join.

           

          Hope this helps, if you want more specific help we will need to get much more specific about your two text files.

          1 of 1 people found this helpful
          • 2. Re: Performance impact after joining 2 files
            Vedant Chaturvedi

            Hi Chris,

             

            Thanks for the quick response.

             

            Using a live connection also didn't have an impact on time.

             

            Answers to your questions:

            1. One file has about 3 million records and the other file has 1000.

            2. We are using a left-outer join with the smaller dataset being the 1st(left). 1 file has 2 columns and the other has 20-30 columns.

             

            We are bringing in a dimension column using the join.

             

            Any idea based on the info, what could be cause here?

             

            Thanks!

            Vedant

            • 3. Re: Performance impact after joining 2 files
              Chris Dickson

              do you still have 3 million rows after you have performed the join (i.e 1-1 or 1-many)?

              • 4. Re: Performance impact after joining 2 files
                Vedant Chaturvedi

                Yes we still have 3 million records.

                • 5. Re: Performance impact after joining 2 files
                  Chris Dickson

                  It sounds like you are comparing the performance of 1000 rows vs 3 million, there will definitely be a difference between these two, but unless your data is very large in each column i wouldn't expect 3 million rows to bother a tableau extract that much.

                   

                  To improve the performance of extracts i would make sure you only keep fields you need. (Hide All Unused Fields) also if your dataset has a time period that is at a lower level than you need to report at then tick 'Aggregate data for visible dimensions' and Roll up dates to the level your report is meant to work at.

                   

                  The final thing i would suggest is once you have made you extract and changed fields data types or moved them from measure to dimension or vice versa then right click on the extract and 'Optimize', this can sometimes have a marginal impact for you.

                  Screenshot 2016-12-06 14.35.31.pngScreenshot 2016-12-06 14.40.07.png

                  Beyond these tips it really comes down to hands on with the data files to understand what might be slowing them down.

                   

                  All the above assumes you have enough processing power on your machine to utilise the in memory elements of a data extract.

                  • 6. Re: Performance impact after joining 2 files
                    Vedant Chaturvedi

                    We are comparing 3 million to 3 million (one is joined while the other isn't)

                     

                    We can't use "Aggregate data for visible dimension" as it represents the wrong data after aggregation due calculations involved.

                    Optimize has already been used.

                    • 7. Re: Performance impact after joining 2 files
                      Chris Dickson

                      Hi Vedant,

                       

                      it is an interesting issue, and i would love to get my hands on the datafiles causing you an issue, as it will be one of those rare issues that us nerds love to untangle. In short it shouldn't be an issue, the only thing i could possibly think of is that the dimension you are bringing in is a huge text field that tableau can't index and therefore slows down any select statements using it.

                       

                      I'm afraid you have reached the end of my ability to help - sorry.

                       

                      I'm not sure if you know about this but just in case -

                      The aggregations are controlled by the ones you tell them to use, i.e. in the data pane you right click on each dimension and set the default aggregation to min/max/sum/avg etc. then the extract will use this aggregation when rounding up in the extract process.

                      Screenshot 2016-12-06 15.46.38.png

                      • 8. Re: Performance impact after joining 2 files
                        Vedant Chaturvedi

                        Hi Chris,

                         

                        Apologies for the delayed response.

                        Unfortunately I can't share the data files. But, I am bringing in date field from the other file(only the missing dates to create a calendar view).

                        This is small field and shouldn't have indexing issue.

                         

                        Also, does aggregation have issues with and without joined data-sets? It seems weird if true as the same aggregations are being used in both the scenarios.

                         

                        Thanks for all the help!!

                         

                        Regards,

                        Vedant