are you able to capture the sql out of the logs or performance recording and then do an explain plan on your database? Also, what is the size / cardinality of your dimension table?
I think you might like to influence Tableau to improve this by up-voting most/all ideas listed below:
You shouldn't blame Tableau, which is a brilliant piece of software. Try to improve your data pipes adding Alteryx Designer as a data processing engine in front of Tableau, see this video.
Remember that Tableau basically is an excellent data visualization tool, but in your particular case (SPARK) and big data sets one more stage will help your team a lot.
This topic gave me the solution :
The first answer explain how to use labels from an external source to mass-assign aliases for a specific dimension linked to this external source. It's not intuitive at all, but working very well !
I have a last performance issue I would like to resolve in order to have a good integration between Tableau and Spark SQL : when I create a filter, Tableau perform a full scan of the data in order to find distinct values making up the filter. After that, filter values are cached and this query is no longer processed when I'm using the filter.
But, if I close/reopen Tableau Software, cache is cleared and all queries are performed again at the opening (it's taking something like 10 minutes in my case).
Is there a way to keep cache when closing Tableau, or maybe another approach will do the trick ?