Hi Tableau Users,
This is regarding issues faced in Production.Apologies for such a long post!
We are using Tableau in one of our projects in production and are facing extreme slowness issue while using Tableau as a reporting tool.
Can you please look into the below scenario and suggest some of the best performance to handle such a huge amount of data in Tableau. Please find the below details :-
We are using Impala on a Mapr distribution from Tableau using an Mapr Impala ODBC Driver. Since Tableau does not have any a native connector to connect to MapR Impala we are using a generic ODBC driver downloaded from :
Currently the table in question currently has 3006761616 records for 21days and is growing. The source table for 21 days has 682.20GB data, from which we have aggregated to 143 GB. All the calculations are performed at the database end. Also we have removed unused columns and rows which are not used in the report.
Tableau is trying to report based on this table. We are using a reporting cluster (3 nodes) where we are generating reports. Partitions has already been done on the table. Please note that since the data is growing at a large scale we are using a LIVE Connection while connecting to the table in Tableau.
We did try to create a simple chart in Tableau for a Time Period of “Last 7 Days” which takes ages to refresh.(Refer pic - 1)
When we check in Impala we can see that it takes more than 3.5 mins to execute.(Refer Pic 2)
As a result the tableau renders the graph in almost 4-5 minutes. Please note only one worksheet is taking this time and not sure if performance recording will help in this regard.
This issue is being highlighted by the customer that “Tableau is extremely slow”.
The only suggestion we receive from Tableau is -
Try to take the data to in-memory by using extracts (hyper), this may work faster as it takes 3 minutes for MapR to execute.
However the customer wants to show data on a daily basis in Tableau for last 30days.We estimate that for a single month the dataset can be around ~ 250 to 280 GB. Scheduling the extracts both for incremental refresh and full refresh(to remove the data from extracts for days greater than 30) would itself be a time consuming/ high resource usage task and I am not sure if handling around 250 GB in an extract would be of much help!
Also this is a single node Tableau installation which can cause slowness in creating extracts.
Now since we are talking about huge volumes to handle in Tableau we would like to know :-
a) Is Tableau suitable for handing this huge amount of data in Mapr Impala?
b) Is increasing the Impala Cluster nodes the only way to handle such huge volume?
c) Did find a Tableau Conference where in a demo is being showed where Jethro is being used to connect to Hive and report in Tableau. Can you please let me know the Industry standard on handling such huge dataset in Tableau. Are there any other tools like Jethro which can speed the query execution time?Anyone tested on the amount of data in Jethro?
Apologies once again for such a long post
Thanks & Regards,