10 Replies Latest reply on May 3, 2018 11:29 PM by Nicole Edmonds

    Data Prep - Output slow?

    Nicole Edmonds

      I am super excited about the data prep tool!  I really like the idea of getting out from under custom SQL I currently use in dashboards and instead using the data prep tool.  That said, what I am seeing when I create the workflows is that when I run a flow for the output, it is much slower than if I create a data extract in the first place from the workbook, using the SQL queries.  Even if I use more custom SQL in the prep, do less clean-up steps, aggregate and output, the timing for running the flow is equally long. 

       

      Because this is using proprietary data from my work, it's a bit difficult to recreate for a sample, since we're talking about 1.4 million rows.  I was curious to see if others have started to use it in their business scenarios and experiencing similar performance related issues? 

       

      Some information about the workflows-

      Connecting to Redshift data, 5 different tables

      I'm using joins, some aggregates, joins to those aggregates and then output. 

        • 1. Re: Data Prep - Output slow?
          kumar bharat

          Hi Nicole,

          In your situation the data volume is huge.

          I suggest you use tableau extracts and apply more filters after you are done preparing the data with data prep tool.

          data prep tool only remodels the data.extracts are faster.

          if you preagggregate your extracts and use them in right way your issue should be solved.

           

          https://community.tableau.com/ideas/3283

           

          BR,

          bharat

          • 2. Re: Data Prep - Output slow?
            Nicole Edmonds

            I am not sure I understand what you're saying here, create the extracts *before* I  load the data into the Prep tool?  The output of the data prep is an extract, which is what I am doing.  My chief complaint is that when I use custom SQL in desktop, and use extracts, this is much faster than what I am seeing in the data prep tool. 

            • 3. Re: Data Prep - Output slow?
              mortenbodaugaard.jrgensen

              That is because performing data transformations at the database layer (Redshift) is almost always performing better than any other tool.

               

              If you expect Tableau Prep to perform better than SQL you are in for a surprise.

               

              What Kumar is suggesting is to essentially copying data from Redshift to Hyper before doing the Prep work. Whether that is faster or not is determined on a case-to-case basis because it depends on many variables.

              • 4. Re: Data Prep - Output slow?
                kumar bharat

                Hi Nicole,

                Ideally whatever approach is faster meaning whatever approach runs the dashboard faster and renders the data faster that would  be the right approach.Data prep tool is one the options to remodel the data in a way tableau understands but  if you are able to achieve the results which are faster and efficient using custom sql then i would suggest go with the custom sql approach.

                You can do a record performance at tableau server to evaluate the performance of your tableau workbook to evaluate the time it is taking in both the approaches.

                Hope it helps.

                BR,

                bharat

                • 5. Re: Data Prep - Output slow?
                  Kate McCalley

                  Definitely experiencing a similar issue. Trying to create a union of 4 very large excel files from different time periods and not only is it taking an excessive amount of time, but it's also randomly excluded data from large periods of time. Not sure why this is happening...but it's frustrating given the amount of time it takes to run the flow that when it's finally done I THEN discover the volume of rows missing.

                  • 6. Re: Data Prep - Output slow?
                    kumar bharat

                    Hi Nicole,

                    I suggest you the following:

                    • Performance check

                    enable record performance option on tableau server and check the existing dashboard performance.It shows where t is taking time.You need to fix the issues.

                    • post performance check start tweaking the existing dashboard
                    1. to start tweaking right from DB connection like applying db filters and then create right joins.pulling necessary fields
                    2. pulling in limited records initially after applying joins
                    3. dashboard redesign,applying best practices and filters like context filters etc
                    4. removing unused fields
                    5. using extracts if data is more

                    Hope  it helps.

                    BR,

                    bharat

                    • 7. Re: Data Prep - Output slow?
                      Nicole Edmonds

                      Thanks everyone this is helpful.  I guess the expectation is that Data Prep could replace a lot of the ETL processing that happens outside of Tableau prior to using the data for the dashboards.  I'll have to test through a few more scenarios to see if it is worth modifying what we already do today elsewhere. 

                      • 8. Re: Data Prep - Output slow?
                        Simon Runc

                        Hi Nicole,

                         

                        Just to add my hat to the ring...I've only been using Data Prep for a few days, but have noticed a few things with regards speed (and certain circumstances).

                         

                        I had 52x 220MB CSVs which I wanted to Union together (and do a little bit of a clean up)...in the end I'd have 135M rows of data (and I've done no joins)

                         

                        Union the 52 files and using the "in build fuzzy grouping" and outputing a .hyper file took 15 mins

                         

                        Union the 52 files and using a RegEx formula for the grouping and outputting to .hyper file took 5 mins (I thought this was very fast...amazing in fact)

                         

                        Union the 52 files and using a RegEx formula for the grouping and outputting to .csv file took 65 mins

                         

                        I have a 32GB Ram laptop (which helps!)

                         

                        So it does seem the "clean up" type we use has an effect as well as (less surprisingly) the file output type.

                         

                        Thought it might be helpful to know.

                        • 9. Re: Data Prep - Output slow?
                          mortenbodaugaard.jrgensen

                          Great insight. It does not surprise me that it takes longer to output to a .csv file than to .hyper since nothing gets compressed

                          • 10. Re: Data Prep - Output slow?
                            Nicole Edmonds

                            In my scenarios, I was outputting to hyper in data prep, which was taking about 15-20 minutes.  I may try a pre-aggregated approach on the data warehouse side, and see if all my unions/clean-up, etc. in data prep lends itself to a faster output.   I won't be able to use data prep effectively within my company without the scheduling capability, but it gives some time to test/figure out best course of action.  (And theoretically, scheduling is coming, and yes, I up-voted the idea already)