    Using extracts for published workbooks

    Mike Penney

      Having some confusion regarding the use of extracts with published workbooks. I intend to publish my workbook with an extract file as the datasource and the data file is to be refreshed hourly.


      Firstly, Do I understand the following correctly?

      1) When I opt to "extract data" from within an active workbook and to "Use" that extract, I am effectively packaging the workbook with that extract. So I should expect that when I publish the workbook, the extract will accompany it to the server and will remain as the interim data source. (I am not talking about Tableau Public)

           a) However, I understand that the "Data source" (sql database) from which the original report was developed remains as the root source of the data and that the extract is refreshed from that root data source at intervals which I specify at the time of publishing.


      2) If I "Publish" a data source, I am just creating an available portal to that data source including some meta-data that I might add. I can enable others to share that data source and I can refresh the published source on schedule in a fashion similar to an extract.

           a) But it is not necessary to configure my published report to use that published data source if I have published my report with an extract.

                (This is where I might be wrong - I believe I read somewhere that the workbook data source might have to be re-specified?)


      So if I understand correctly, to configure a published report to use an extract with a refresh schedule, all I have to do is:

      1) Extract the data using DATA / EXTRACT DATA (Tableau automatically configures the report to "Use" the extract)

      2) Publish my workbook, selecting a refresh schedule for the packaged extract


      But when I do this, initially, my workbook appears to have data and quick filters work normally. Then after the first refresh cycle, my data disappears and the workbook views go blank. I still see my quick filters, titles etc but no data. I do not believe that a user filter is filtering out data because I can see it initially after publishing and before the first refresh.


      Any ideas?

          Russell Christopher

          Hi Michael.


          1. Correct.


          1a. Yes.


          2. Yes - unless your Published data source includes an extract, at which point your data source is a "portal" to the extract rather than the original data source.


          2a. Your report must use the Published Data Source as it's source if you want to leverage the published "Data Server" Data Source rather than the local data source which is embedded in your workbook. You must add the published data source to your workbook and then tell the workbook to use the Data Server data source as it's connection vs. the original one.


          A published report may use a "local" data source  / extract which is refreshed, or it may use a "shared" Data Server extract, which is also refreshed. The advantage of the latter scenario is that multiple workbooks can share a Data Server data source / extract. If you publish a workbook which contains a "local" data source / extract, then that workbook is the only one which can leverage it.


          The problem you're actually reporting may or may not have anything to do with the type of data source your report uses. Are you sure the extract refresh is working? Are you sure the extract refresh doesn't include a filter which knocks out the rows you wish to view? What happens if you download the workbook and view it in Desktop again? Maybe you can explore a downloaded copy without the user filter in place and see whats there...

            Mike Penney

            Russell Thanks for your help with this. I believe I am on the right track now.


            It does appear that my blank data views were somehow a function of the filtering I applied when creating the extract (compounded by my User Filter) although its not clear why. At least by providing an unfiltered extract, my published workbook is functioning as expected including the scheduled update. I will get to the root of the filtered extract issue and get back to you.


            But your response did drive me to dig around for a better understanding of Tableau Server 7 data server. I now understand more clearly, the use and management of published data sources and extracts. It is more clear now, how I can improve overall efficiency as demand for server time continues to grow.


            Great community!

              Michael Borner

              This is quite the confusing topic.


              I am wondering how I should handle the following:

              On the data side, I'm using a Data Extract "DE_Extracted", which I've published to the server. Let's call the published version of the extract "DE_Published". At time of publishing this, I set the extract schedule to Full Refresh on a daily basis at 10am.


              On the workbook side, I originally was using the recently-extracted extract "DE_Extracted" (evident by when I go to Data, DE_Extracted, I see a checkmark next to "Use Extract").


              If I want to take advantage of the server's scheduled data extract, it is a MUST that, from the workbook in Tableau desktop, I go to Data > Connect to Data and then choose Tableau Data Extract, Extract on Server, and select my "DE_Published" Data Source. Then I need to replace the original "DE_Extracted" with my server version, "DE_Published", by going to Data > Replace Data Source... > Replace "DE_Extracted" with "DE_Published". Then I highlight the "DE_Extracted" data source in the Data menu and choose "Close" because I no longer need the extract.


              Now that I've replaced the extract with the Server version, I want to publish my workbook so it automatically contains the data refreshed on a daily basis in "DE_Published".

              So, I go to Server > Publish Workbook...> [Authentication...] and set the Data Source to "DE_Published" with an embedded password.


              Is it necessary for me to also set a schedule for my published workbook to refresh? Or is the fact that it points to an automatically-refreshing data source good enough?


              If it is necessary for me to set a schedule for my published workbook to refresh after the "DE_Published" extract is refreshed, and I do not know how long it takes for the 10am "DE_Published" datasource to refresh, how then should this be handled?


              Thank you!