This is a question and guidance request around the pros and cons connecting to published data source on your Tableau server.
I've done quite a bit of research and still not found the answers I am looking for so I am going to try the forum fro guidance here. Even my Tableau support guy is struggling to answer this question.
It will help if take time to be exact and specific as possible. Let me describe the server environment I working in then ask the question.
- Tableau server 9.x
- Published data source on Tableau server called "WibbleData". This published data source is update via a scheduled extract with my internal Oracle DB server.
- Workbook users access the Tableau server to view workbooks that have been published online by workbook designers.
Note: I have many hundred of workbook users and a handful of designers. No users or designers have access to my Oracle DB so I publish the data as Tableau data source. Due to constraints the published data source is updated via a scheduled extract refresh.
My question is around the tableau server published workbooks and their performance depending on the type of connection between the workbook and the published Tableau data source "WibbleData". A workbook could have a "Live" connection to the published data source or an extract of the published data source. What are the pros and cons to performance of each?
My belief is that if I have two identical workbooks that use all the data from the published data source then there should be no difference in performance between the LIVE and EXTRACT workbooks. Is the correct?
If only a subset of the data is used in the workbook does the performance differ between the LIVE and EXTRACT variants of the workbook to the same published data source. Does that exclude filters applied on the connector to the published data source?
Why the question?:
The challenge I have is that I have some complex data and there is a significant overhead in processing. That is why I am sharing it via a published data source. Teams of people are using this data in their own workbooks and creating extracts sharing via the server. When the published data source gets updated all the workbook owners tried to get their stuff updated immediately... which is extremely complex to do. It would be easier to have a live connection from the workbook to the published data source but I am not clear on the performance overhead of this and the factors that impact it.
I hope this makes sense