I recall seeing this exact same question on the forum the other day.. I had a hunt and found the thread Tableau server data source showing up as Extract as well as Live on Server
It's not yet been marked as correct, but it is.
Thank you for sharing the link. I looked for someone having the same issue using the search function but I wasn't able to find this thread.
So please, correct me if I'm wrong:
From the WORKBOOK perspective, it has a LIVE connection to a published data source, which is in this case a TDE (so, in fact, an Extraction from the Tableau Server perspective).
In another scenario, where I make a visualization using a local extraction of a random database:
From the WORKBOOK perspective, it has an EXTRACT connection to a non-published data source, which would be the local extraction I embedded in the Workbook.
So am I safe to assume an EXTRACT connection is always on a non-published data source? In which case I can have an EXTRACT to a published data source? Never, right?
From the WORKBOOK perspective, the connection is labelled as LIVE if it that connection is just a 'pointer' to data stored elsewhere, whether that's a direct connection to a SQL server database, or an excel file stored somewhere, or a published data source on server (which in turn may itself be then labelled LIVE as it to is a pointer to the data or labelled EXTRACT if the data has been 'copied' onto the server).
From the WORKBOOK perspective, the connection is labelled EXTRACT if that connection has been further 'copied' into the workbook itself (so the data in the workbook won't update unless the workbook is refreshed on the server). IT IS possible the workbook has a connection labelled EXTRACT that is a reference to a published data source.
Here's some examples from the set up in our company
These are 3 data sources all published on our Tableau Server. The source of all these data sources is a SQL Server database.
Service Delivery - Service Call is labelled as EXTRACT as each day we pull the data from the database (via a scheduled extract refresh task) so 'a copy' is stored in Tableau's 'database' on the server. When users connect to this data source, they are never hitting the SQL Server with their queries.
The other data sources are labelled LIVE as the data only exists on the SQL Server. When users connect to this data source, they are querying the SQL Server.
This workbook is connected to 2 of the published data sources (it connects to one of them twice). They both are labelled as LIVE as they are both just pointing to the published data source to get its data. The data is not 'copied' into the workbook itself. The workbook does not need to be refreshed to get the latest data (although it will be 'out of date' if the extracted data source Service Delivery - Service Call failed to refresh for some reason). If I download this workbook from the server it will be saved as a .twb file (Tableau Workbook).
This workbook is also connected to Service Delivery - Service Call, but this connection is labelled EXTRACT.
This is because I have taken a further 'copy' of the published server data source and the data is now stored within the workbook. Even when the Service Delivery - Service Call data source extract is refreshed, the data within this workbook won't change until I also refresh the workbook. In this instance I haven't put the workbook on a refresh schedule as its just an example workbook that doesn't need current data. I could have left this workbook pointing to a published data source itself (ie LIVE connection), but performing the additional extract means that the example won't break if anything changes to the data source structure, and its also more performant (I hid lots of unused fields). If I download this workbook it will be saved as a .twbx file (Tableau Packaged Workbook) as the workbook contains the data.
This workbook has 2 LIVE connections - 1 to the extracted published data source (Service Delivery - Service Call) and 1 to an Excel spreadsheet.
The Excel spreadsheet is stored on a file server and IS NOT a published data source.
Hope that helps clarify things a bit more for you.
Thank you for your explanation, it really helped clear up some confusion I was having about Live vs. Extracted data sources on the Tableau Server.
I'm wondering however, how you initially published the "Service Delivery - Service Call" data source as a refresh-able extract? I have a similar data source that I would like to connect a .twb to and query the refresh-able extract published on Tableau Server rather than the database itself (due to performance issues when hitting the db repetitively with the different queries needed for my visuals).
When I attempted to publish this extract, I connected to the database that stores my table via Tableau Desktop, and dragged the my table into the Data Source window. I then selected the "Extract" radio button located at the top right of the Data Source window under Connection. I edited the Extract details selecting All Rows, Incremental Refresh, and supplied a field to base the incremental refresh off of. I then went to the top ribbon and clicked Server, Publish Data Source, and selected my table (it was the only one listed). I then receive a successful publish notification but when I go and view my newly published data source on the server under Home, Data Sources, it is listed as "Live."
Do I need to first download the .tde to my local machine, then publish? This would not be ideal as the source contains several billion rows and I'm pretty confident my computer doesn't have enough space for a .tde of that size!
Thank you in advance for your help!
1 of 1 people found this helpful
Interesting, I've just tried that myself and get the same behaviour. That's not the method I use nor have I ever tried before, so not sure what's happening there (and I can see from http://onlinehelp.tableau.com/v10.1/pro/desktop/en-us/help.htm#extracting_data.html that it's the suggested way to create an extract... I'm using v10.1.1).
Anyway, what I do is connect to my table in the data source window, then 'go to worksheet', so I'm in this view
I then right click on the data source and choose extract data which brings up the same dialog you mention above. This generates the extract on my machine (the icon next to the data source changes from a single to a double cylinder when complete), which I then publish to server and the 'publish server' dialog will have an option to choose a refresh time.
The downside of this, is that the initial extract will extract all the data which sounds like it'll be an issue with you with 4 billion rows. One way people get around this if connecting to a SQL table, is to create a SQL view instead which selects from the table, and then connect to this. The view can initially be written to have a SELECT TOP 1 statement, so only 1 row is returned and extracted when using Desktop. The data source then gets published to server. The SQL View is then changed to remove the SELECT TOP 1, and when the data source is then refreshed on server, all the rows are returned, so the server gets the load of pulling all the rows rather than your Desktop. I don't use incremental extracts so not sure how successful this method would be with that, although if you wrote the view with a SELECT ... WHERE [ID] = the lowest one in the set, then I'd imagine when this is then removed, the extract would just add all the additional IDs...
Hope that helps
You are a guru, Donna!
We are connecting to a Hadoop cluster via Drill views. I added a limit clause in my Drill view's query, downloaded the limited .tde to my machine, and published that to our Tableau Sever. Then I went back to my Drill view and deleted the limit clause, refreshed my extract on Tableau Server, and sure enough, all of my data was there with expected counts!
Am I correct in saying that the .tde has the connection details to my Drill view and the latest extracted output? So when I click refresh on Tableau Server it just pokes the Drill view and steals the output for us to then query from a Tableau dashboard?
Thank you for the idea to limit my view in order to initially publish my extract to the server. It worked perfectly!
Glad you got it working :-)
The tde contains the extracted data and the connection details of where that data is sourced from. When you refresh the extract on server either manually or via a schedule, it uses the connection details to pull the data from source. I don't work with incremental extracts, so my refresh will always get all the data as defined in the database SQL view. I assume an incremental extract will just pull the additional data on refresh based on the ID field defined within your extract.
Does that answer your question...?
Yes, that answers my question.
Thank you again, Donna!
I have a little different situation that is confusing the heck out of me.
I have been working with a data source for the last few months and I extract the data, it returns the rows, I check to make sure it is an extract in the radio button and then navigate to server > publish data source. The publish wizard pops up, fill that out, and then publish. Usually, it goes through another progress bar to show pushing it to the server and what not, but sometimes, it's instant and that's when I know something went wrong and I can't figure out what it is. After the instant publish happens, I go to the data source page on the server and see the "Live / Last extract" as "Live." I understand that the live/extract depends on the perspective; the workbook is live because it's a live connection to the data source, and the data source is an extract because you're grabbing a snap shot (extract) of the data. Why is it that some times the manual extract will upload/publish as a live data source?
Thanks in advance
Another tid bit I want to include is that this has happened in the past and the way I worked around it was to download the data source that was just published, extract it again (that process asks me to save it as a TEMP_XXXXXX file) and re-publish. I thought the TEMP_XXXXXX file would be useful because I had never seen that before.