We are facing some performance issues with our Tableau workbooks and are trying to find an answer for the same. Hoping to get some insights from this forum.
Because of the restrictions, I will not be able to publish the workbook in the forum but I will try to provide as much information as I can and also whatever I have figured out during my investigation of the issue.
I have also raised a case with Tableau and hoping to find some answers from them as well. I will update this thread once I get more information.
Tableau version: 10.4
Data Source: Cloudera Impala
Connection Type: Live (we have also tested with Extract and findings are listed below but we have to keep the connection as Live).
Tables: Only 1 direct table used in Tableau. No Joins.
Partitioning: The database table is partitioned on a Date field (stored as integer in the format of YYYYMMDD. This is because Impala does not support Date data type).
Data Source Filter: In Tableau we have a data source filter on the date field which does the filtering for a single day records. The date is selected by the user from a Date Parameter in Tableau.
Number of Rows: Roughly 5k per day.
Number of Dashboards: 5. Main Dashboard having around 9-10 sheets and ability to navigate to other dashboards using Actions. Other dashboards having around 1-2 sheets. The dashboards are published as tabs on the server.
Quick Filters: Just 1 quick filter with 8 distinct values.
Upon launching the dashboard from the server, the initial load takes a lot time. It is in the range of 25-30 seconds. Once the initial load is done, the remaining actions are quite fast. For example, changing the Date results in the updated dashboard in about 2-3 seconds. Navigating to other dashboards also happens within 2-3 seconds.
We are trying to identify and minimize the initial load time.
We have been using the Tableau Performance Recording in order to understand what could be causing the initial load times. Following are our observations:
1. The "Connecting to the Data Source" event takes about 6-10 seconds on an average. Sometimes it is more than this.
2. There are multiple events of "Connecting to the Data Source". The first event is about 6-10 secs. Other events are in the range of 1-3 seconds. And many others less than 1 second. However, we are not sure why there are multiple events, when all the dashboards and worksheets use the same connection in the Tableau workbook.
3. There is an event "Building View" for the Main Dashboard which takes about 12-16 seconds. When navigating from the Main Dashboard to the other dashboards, the successive "Building View" events take about 6-8 seconds.
4. The visualization queries getting fired are all getting executed under 1 sec respectively. This is confirmed both from the Performance Recording and also looking in Impala. Depending on the query complexity, they are taking anywhere between 0.3-0.9 seconds to execute.
5. Most of the queries are getting executed in parallel. This is seen from the Performance Recording.
6. When created an extract, everything is much faster - Connecting to Data source: 1 sec (with only one event), Building view: <1 sec and Executing Queries: < 0.4 secs
7. We have also published the workbook with the "Publish as tabs" unselected but didn't see much difference in the performance.
With the above findings, we are trying solve the issue of the time taken to establish the connection to Impala and also reduce the time taken for Building Views.
Not sure why the building views takes such a less time when using an extract as I believed that Tableau would have to render the same view irrespective of live or extract.
Any thoughts or pointers will be highly appreciated.
Please let me know if any other information is required that will be useful.