Have you tried optimizing your extract?
Edit: reading comprehension classes for me...
I found the extraction performance is significantly degraded if there are a lot of calculated fields.
I have one CSV file, about 1 million rows, about 25 original columns, total 40 calculated fields. I did following tests
(1) Original CSV file, without calculated fields, extracting time is 10 seconds
(2) With 40 calculated fields, extracting time is 5 minutes.
To me, the extraction time is very critical to my job. So I'm still using Tableau Version 4.1 which has faster extraction.
I'm not sure whether the upcoming Tableau 8 can solve this issue ( a lot of calculated fields)
4? Are you sure?
Currently Tableau favors building extracts which perform better once completed, even if it requires extra work up-front. This means that sufficiently complex calculated fields will be much faster in the extract because they will be materialized, having already been computed by the remote database. I don't believe that there's an easy way to suppress this behavior. However there is a workaround you can try, which is understandably limited but may help in specific circumstances: simply create the extract from a connection that has no (or few) calculated fields, and after the extract creation is complete then add your calculated fields back and then optimize the extract.
I hope this helps.
A trick which I have used to stop calculations from being materialised is to include something in the calculation which Tableau recognises as being inherently variable, so will need to be calculated on the fly each time.
For example for a numeric calculation you can add the value of an expression which you know will always evaluate to zero but Tableau can't guarantee that. For example add the value of a parameter but always leave the parameter as zero, or add (NOW() - NOW()) - which will always be zero but Tableau sees NOW() and thinks that will vary.
I think that would achieve what you want, though it is just possible that the expression might still get sent to the database. If so, then doing as Robert suggests is probably the way to go. I typically just hide and unhide the expressions I don't want included rather than actually removing them from the connection.
Thank you very much!
I add (NOW() - NOW()) to all of calculated fields in my test workbook (1 million CSV file). The extraction time is same as that with no calculated fields: 10 seconds!!!! With 40 calculated fields,extraction took 5 minutes.
I suggest Tableau give user an option whether to create calculated fields in TDE file.
> I suggest Tableau give user an option whether to create calculated fields in TDE file.
I lobbied for that some time ago - but there is always a trade-off between simplicity and ease of use and new features. Simplicity won.
This proved to be a very effective workaround for me. Thanks!
I had a lot of calculated fields that returned strings, so I added "if (now()-now())==0 then A else A end" constructs everywhere and the extract time dropped from more than a day to a few seconds.
The default behavior of Tableau inhibits the use of complex calculations. The problem is compounded by the fact that the generated SQL is scarcely optimized, redundant and full of unnecessary type conversions that bypass any oracle index, but for specially constructed function-based ones.
I think that this behavior is a bug, given that the SQL executed by tableau is different from the one you specify in a "custom SQL" datasource.
1 of 1 people found this helpful
Great that it proved so effective.
Your construct has inspired another thought for me. Building on Robert's idea about removing the calculated fields before creating the extract, then adding them again and optimising the extract (which has the effect of using the data engine rather than the underlying database to evaluate the calculation values to be materialised), I think it will be possible to make that process work very simply.
I'm planning on experimenting with this to make sure it works, but what I'm thinking is to define a single calculated field which uses one of the constructs which suppresses materialisation and reference that from all other calculated fields. Create the extract, then change the single calculation to return a stable value. Then optimise. I think that should work, if Tableau is analysing the dependencies between calculations correctly (which it must be).
The single calculation could be defined as: "IF (NOW() > 0) THEN 0 ELSE 0 END" with the result simply added to all numeric calculations. After creating the extract, change that to "IF (1 > 0) THEN 0 ELSE 0 END" and optimise to get all the calculations materialised at data engine speeds.
I think the fact that this trick reduced extract creation from over a day to a few seconds provides a very compelling argument for exposing control over when and where the materialisation calculation is done. There could even be an option to defer materialisation till after the extract has been created as a matter of course - so that you don't need to go back and do that as a separate action (which for large extracts would save on -repackaging time). That would in principle allow the use of the fast data loader for text file data sources with calculations defined, too. Lot's of scope for big wins here, I think.
Am I missing something fundamental, here, Robert?
I really appreciate Rich's help.
Before I got Rich's workaround, I just use Tableau Version 4.1. Because Tableau 7.0 significantly degrades my productivity.
My workbook template connects to 8 data sources ( CSV files), each has a lot of calculated fields, if extraction for each data source takes as long as 5 minutes, then the all extraction time could be 40 minutes. It could take as long as 1 hour from extraction to PDF printing. Everyday I could have 10 reports to generate, it needs 10 hours! New using Rich's workaround, it takes at most 20 minutes to generate one report by Tableau 7.
Tableau should seriously consider this performance issue. (For user working on text files with a lot of calculated fields)
I've tried Richard's suggestion and it does indeed work.
I've added a measure to my workbook with a construct like "if (now()-now())>0 then 0 else 0 end" and I've added that to every calculation. That prevented Tableau from materializing the calculations, keeping data extract time within reason.
Then I modified the custom dimension to "", gaining the capability of optimizing every calculations.
I'm quite happy with this workaround, but I still think that Tableau should correct this bug.
When I specify a custom SQL datasource I have every right to expect that's what will be sent to my db, not something else