While I'm not familiar with Cloudera, you could try the Performance Recorder. If you are Live it will a) show you which query is taking how long (this way you can determine if it's one calculation, or several and b) you should be able to see the query being sent. The below GIF (for a different post) shows how I can see the query time, render time and the query for a Live SQL server connection.
Often it's a particular thing that the database struggles with. for example we use (among other things) EXASol, where it is super fast at things like COUNTD, but is relatively slow when it comes to date manipulation (where you just get use to using all the exotic date handling Tableau has). We could see the query being sent was doing multiple casts and calculations to do things like (DATETRUNC and DATEDIFF), so were able to work around that.
Hope that helps
1 of 1 people found this helpful
A few additions to what Simon wrote:
1. Run your custom query directly against Impala. Tableau is never going to run faster than however long it takes Impala run the query, so if the query is slow in the first place you'll need to work on that or get a friendly DBA to help out.
2. If by "custom query" you mean Custom SQL then get rid of that and instead in Tableau use a standard connection to the table with a data source filter. The reason why is that Tableau always executes the entire Custom SQL so if you've got 150 columns in the Custom SQL and are just showing something like State & SUM(Sales) Tableau is still going to pull all 150 columns and then aggregate over that instead of the much more efficient SELECT State, SUM(Sales) FROM table GROUP BY State.
3. The best resource on Tableau performance is Best Practices For Designing Efficient Tableau Workbooks , besides having instructions on how to use the performance recorder and to avoid Custom SQL it's chock-full of other tips & techniques.
A question and i am more interested in exactly what happens when a load a dashboard.
Say my tableau workbook published on the server has 5 different dashboards say d1, d2, d3, d4 and d5.
Now following are my questions. I really want to understand how tableau functions w.r.t loading the data and then working with it.
Note: The dashboard is using a custom SQL query and the live connection is a must requirement.
1.) Whenever i click the tab of a particular dashboard (say d4) , it takes some time to load. I believe that every time i hit a different dashboard tab, the following happens : Database connection, query execution, layout establishment etc. Please correct me if i am wrong.
And now if i move to d5 from d4, the same processes (database connection, query execution, layout establishment) takes place.
Then again if i move back to d4 from d5, the same processes(db connection, query execution, layout establishment etc) will take place. Is that correct?
**My movement is like this : d4 --> d5 --> d4
2.) Say my custom SQL query has about 55 variables in the select clause and a where clause applied on just one column. Now my dashboard d4 uses only 10 of the 55 variables in the custom SQL query. Is Tableau not smart enough to pull only those 10 variables out of those 55 variables when the custom SQL query runs or it loads the entire data which is part of the custom SQL query and then picks up the required variables before starting it's calculations?
3.) How many times would the custom SQ query run and under what conditions while exploring the dashboard?
1) Tableau has multiple levels of caching (which are also subject to Tableau Server configuration settings) so it may not need to go through all of those steps each time.
2) Because custom SQL code can be arbitrarily complex (running subqueries, complex join criteria, database-specific functions, etc.) Tableau does not “peek” inside the custom SQL to prune the query.
3) For each initial load or refresh of a view Tableau will issue 0 or more queries that hit the DB. Part of this is due to caching, there’s also query batching & query pruning (more on those in a second) that Tableau will do to reduce the number of queries that are issued to the DB. The “raw” number of queries that Tableau will issue is determined by how each view is built including the level of detail (dimensions) used for each worksheet and marks card in multiple axis views, whether relevant values is being used on quick filters, some totals & grand totals including some total reference lines, etc.
Also note that Tableau will attempt to run queries in parallel whenever feasible. So given that the Custom SQL is run in its entirety every time it’s possible that a dashboard could kick off simultaneous queries to the DB that due to the Custom SQL cause the database itself to run into memory & caching limits as it tries to satisfy all of those queries at once. So this is another reason to avoid Custom SQL.
Tableau does not work like a traditional batch reporting tool where (most likely) the exact same query (maybe with updated parameters like dates) is run every time the report is run. Instead Tableau’s VizQL technology generates queries on the fly based on what the user is doing in that moment.
Let me write this again in a different way: If the “simple select statement” you described in your original post is 1 or more tables with basic joins and a where clause there is no performance gain to using custom SQL and most likely the custom SQL will trash performance. So instead use Tableau’s native functionality to create any joins, use a data source filter, and potentially turn on “assume referential integrity". This would enable to Tableau to do query pruning (where it only selects the table(s) & field(s) that are necessary for a view) and more likely get query batching (where multiple potential queries are combined into one).