10 Replies Latest reply on Jun 28, 2017 7:48 AM by Larry Hill

    WDC slow processing of data - String type specific

    Jordan Lee

      We are loading daily time series data, with around 1,000 values per day. We have a String type column that contains a unique value per day (so about 1,000 unique values). And there is around 30 days of this data. So around 30k rows of data total.

       

      Tableau Desktop takes around 8 minutes to process this data. This is after we have called appendRows from the WDC SDK so after our code is done with it. While it is loading, we can see that the tabprotosrv.exe process is consuming 100% of one cpu core for the full duration of the processing. So this leads me to believe all of this processing is happening within Tableau Desktop itself.

       

      With this same 30k rows of data, if we exclude this String type column but keep the rest of the data the same (still have 1,000 values per day for 30 days), the data will process in around 5 seconds.

       

      So somehow whatever Tableau is doing on the String values (indexing? analyzing? etc) is taking a huge amount of time. If this String type column only contains a small set of values (say 50 for example) then it loads in under 10 seconds as well. So it only happens when there is a large amount of unique values per day.

       

      Have tried this on Tableau Desktop 10.1 and 10.2. If I test this same data as a CSV import Tableau processes it relatively fast so it seems like a problem specific to WDC.

       

      Is this expected behaviour? I can't imagine that loading 30k rows of data taking 8 minutes is normal.

        • 1. Re: WDC slow processing of data - String type specific
          Patrick A Van Der Hyde

          Hello Jordan,

           

          Have you tried doing this with the Performance monitor enabled?  I'm guessing this is related to tableau waiting as the content is transmitted over a network but the performance monitoring tools should provide a bit more understanding of what's going on under the hood. 

           

          Anyone else in this group work with examining what's going on under the hood for Web Data Connectors? 

           

          Patrick 

          • 2. Re: WDC slow processing of data - String type specific
            Jordan Lee

            Again,

             

            30k rows of data without the string column, 5 seconds.

            30k rows of data with the string column, 6-8 minutes, while it uses 100% of one cpu core.

             

            So that seems to quite clearly rule out a network issue? And that Tableau is actually processing and not just waiting.

             

            I ran it with the performance recorder and all I got was that it took 7 minutes of 'executing query' without any other details that were useful that I could see.

            • 3. Re: WDC slow processing of data - String type specific
              Yuriy Fal

              Hi Jordan,

               

              I guess that Tableau Data Engine is taking resources

              for sorting & trying to compress your string column --

              with a little success because of unique values.

               

              May be you'd be able to skip this column altogether --

              or generate a numeric sequence (or token) instead.

               

              Just my 2c, 'cause I'm not a WDC guy at all.

               

              Yours,

              Yuri

              • 4. Re: WDC slow processing of data - String type specific
                Jordan Lee

                It's a meaningful column, using a numeric sequence instead wouldn't make much sense.

                 

                Here is a sample of the data (jn CSV form):

                 

                Curve Name,Curve Type,Effective Date,Granularity,Number of Records,Curve Value

                TGP-Z4 Sta-219 Price,Price,2017-01-01,Daily,1,3.222

                TGP-Z4 Sta-313 Price,Price,2017-01-01,Daily,1,3.1398

                TGP-Z5 200L Price,Price,2017-01-01,Daily,1,4.65

                TGP-Z6 200L Price,Price,2017-01-01,Daily,1,6.1186

                Transco Leidy Line receipts New Jersey Price,Price,2017-01-01,Daily,1,3.1461

                Transco Zn3 Louisiana Price,Price,2017-01-01,Daily,1,3.5214

                Transco Zn4 Alabama Price,Price,2017-01-01,Daily,1,3.0713

                Transco Zn6 NY New York Price,Price,2017-01-01,Daily,1,3.2977

                Transco-30 Texas Price,Price,2017-01-01,Daily,1,3.477

                 

                I am talking about the first column there. About 1,000 of those names per day for 30k total rows.

                • 5. Re: WDC slow processing of data - String type specific
                  Larry Hill

                  Jordan,

                   

                  Large string manipulations are a bit tricky in Tableau. They can be inefficient, as they are processed in Tableau (as you already know). Thinking outside the box, is there another approach to procuring the data? Perhaps an ETL process into a cloud data source (Redshift, SQL Svr on Azure, etc...) Using a tool such as Informatica cloud to run regular ETL processes on a schedule into a cloud datasource, then pulling into Tableau is much more enterprise ready, scalable solution.

                  • 6. Re: WDC slow processing of data - String type specific
                    Jordan Lee

                    I'm not sure what processing the ETL in a cloud has to do with this. The problem here is happening all post transformation and just when the data is being loaded into Tableau.

                     

                    In the data example I posted above I would have 1,000 unique locations that I want to be able to visualize in Tableau. Whether or not that data is being pulled in from a WDC or a cloud datasource there isn't a difference in scalability. This just seems like a bug specific to WDC where this string processing is doing something much more intensive than it should be.

                    • 7. Re: WDC slow processing of data - String type specific
                      Larry Hill

                      Understood...the point is that since this is a bug, you might want to consider a workaround. The WDC is essentially the extract API with a Javascript wrapper. You are pulling the data in, and then, applying some string manipulations against it in Tableau. The performance might still be poor without the WDC if the data were local (Have you tested that?) Until the Tableau product team can get this issue resolved, backing this process up stream in an ETL process eliminates Tableau's need to run through string calc itself.

                       

                      Here's a link to a great blog by Interworks with regards to performance: https://www.interworks.com/blog/bfair/2015/02/23/tableau-performance-checklist

                      Tableau Online Help: Create Efficient Calculations

                      • 8. Re: WDC slow processing of data - String type specific
                        Larry Hill

                        Do you have a sample of the calc you are doing against this data?

                        • 9. Re: WDC slow processing of data - String type specific
                          Jordan Lee

                          There are no calculations happening here. I am not performing any string manipulations. The 8 minutes of processing occurs just when importing the data from the WDC as a new data source. This is before being able to see the preview of the data and before using the data in a sheet.

                           

                          Like I mentioned in my original post, this is after we call appendRows. So we have ALL of the prepared data at that point already. All we want to do is get it into Tableau. And it takes 8 minutes just to get the raw data in.

                          • 10. Re: WDC slow processing of data - String type specific
                            Larry Hill

                            Ok..gotcha...I misunderstood the original question. This seems to be an issue with on the WDC end. I sent you an email directly with options. You can disregard a couple of them that refer to the re-writing of the string (That doesn't exist).

                             

                            This might be a great resource for you: GitHub - tableau/webdataconnector: Bring the data you care about into Tableau

                             

                            There are members who do this day in and day out and might have run across this issue and have a workaround.