Have you tried doing this with the Performance monitor enabled? I'm guessing this is related to tableau waiting as the content is transmitted over a network but the performance monitoring tools should provide a bit more understanding of what's going on under the hood.
Anyone else in this group work with examining what's going on under the hood for Web Data Connectors?
30k rows of data without the string column, 5 seconds.
30k rows of data with the string column, 6-8 minutes, while it uses 100% of one cpu core.
So that seems to quite clearly rule out a network issue? And that Tableau is actually processing and not just waiting.
I ran it with the performance recorder and all I got was that it took 7 minutes of 'executing query' without any other details that were useful that I could see.
I guess that Tableau Data Engine is taking resources
for sorting & trying to compress your string column --
with a little success because of unique values.
May be you'd be able to skip this column altogether --
or generate a numeric sequence (or token) instead.
Just my 2c, 'cause I'm not a WDC guy at all.
It's a meaningful column, using a numeric sequence instead wouldn't make much sense.
Here is a sample of the data (jn CSV form):
Curve Name,Curve Type,Effective Date,Granularity,Number of Records,Curve Value
TGP-Z4 Sta-219 Price,Price,2017-01-01,Daily,1,3.222
TGP-Z4 Sta-313 Price,Price,2017-01-01,Daily,1,3.1398
TGP-Z5 200L Price,Price,2017-01-01,Daily,1,4.65
TGP-Z6 200L Price,Price,2017-01-01,Daily,1,6.1186
Transco Leidy Line receipts New Jersey Price,Price,2017-01-01,Daily,1,3.1461
Transco Zn3 Louisiana Price,Price,2017-01-01,Daily,1,3.5214
Transco Zn4 Alabama Price,Price,2017-01-01,Daily,1,3.0713
Transco Zn6 NY New York Price,Price,2017-01-01,Daily,1,3.2977
Transco-30 Texas Price,Price,2017-01-01,Daily,1,3.477
I am talking about the first column there. About 1,000 of those names per day for 30k total rows.
Large string manipulations are a bit tricky in Tableau. They can be inefficient, as they are processed in Tableau (as you already know). Thinking outside the box, is there another approach to procuring the data? Perhaps an ETL process into a cloud data source (Redshift, SQL Svr on Azure, etc...) Using a tool such as Informatica cloud to run regular ETL processes on a schedule into a cloud datasource, then pulling into Tableau is much more enterprise ready, scalable solution.
I'm not sure what processing the ETL in a cloud has to do with this. The problem here is happening all post transformation and just when the data is being loaded into Tableau.
In the data example I posted above I would have 1,000 unique locations that I want to be able to visualize in Tableau. Whether or not that data is being pulled in from a WDC or a cloud datasource there isn't a difference in scalability. This just seems like a bug specific to WDC where this string processing is doing something much more intensive than it should be.
Here's a link to a great blog by Interworks with regards to performance: https://www.interworks.com/blog/bfair/2015/02/23/tableau-performance-checklist
Tableau Online Help: Create Efficient Calculations
Do you have a sample of the calc you are doing against this data?
There are no calculations happening here. I am not performing any string manipulations. The 8 minutes of processing occurs just when importing the data from the WDC as a new data source. This is before being able to see the preview of the data and before using the data in a sheet.
Like I mentioned in my original post, this is after we call appendRows. So we have ALL of the prepared data at that point already. All we want to do is get it into Tableau. And it takes 8 minutes just to get the raw data in.
Ok..gotcha...I misunderstood the original question. This seems to be an issue with on the WDC end. I sent you an email directly with options. You can disregard a couple of them that refer to the re-writing of the string (That doesn't exist).
This might be a great resource for you: GitHub - tableau/webdataconnector: Bring the data you care about into Tableau
There are members who do this day in and day out and might have run across this issue and have a workaround.