If you are going to distribute the visualizations with Tableau server, and you want the underlying data to be able to be changed, and auto refreshed, then I highly recommend using a database like MySQL or MS SQL Server instead of a text/Excel/Access file.
Does this mean csv files can't be auto refreshed?
Also if I put the data in SQL, how well does Tableau deal with refreshing 20GB of data?
You can be connected to a "live" copy of the CSV file for a view that is published to Tableau Server, I just do not recommend it.
And if your CSV file is 20GB, then you really really do not want to leave it as a CSV file. You want 20GB of data in a database because if you leave it as a CSV file, Tableau will use MS Jet to query the data, and MS Jet is the worst choice for querying large amounts of data, IMO.
I don't know all the specifics of your situation, but it sounds like you may want to setup a process for handling the data. Here are some questions that would help in deciding on a good route:
- How often does this 20GB CSV file change?
- Is the entire file changing each time, or is it simply that just some rows added?
- Where is the file coming from, what is generating it?
There are other follow up questions after those.
We generate huge amounts of log data (TBs per day) using Hadoop. All of our data sits in text files. I have to say one of major reasons why I went with Tableau was the supposed ability to handle csv files....
20GB file changes daily, think of it as having a list of crawled URLs with a bunch of other relevant data. By far the easiest way for me is to scp a csv file to windows machine and have Tableau point to it. Loading all of this to SQL would be a pain (and slow).
Where does MS Jet come in? My assumption was Tableau creates and extract and sues their own data format to do processing. I apologize if this questions are silly but I'm not a Windows guy.
Okay, that makes more sense (I was under the impression that you wanted to "Connect live" to the CSV file), and yes with Tableau Server you can schedule the extract, so here is the process you want to use:
1. Have the CSV file that changes daily places on a network share that the Tableau Server has access to
2. In Tableau Desktop Pro, when connecting to the CSV file, navigate to the file location with a direct path like: "\\servername\share\pathfile.csv"
3. Create your extract, or when connecting select the "Import All" option
4. Build your visualization
5. Publish to Tableau Server
6. Schedule an extract to be performed daily for that data source from Tableau Server.
When you "Connect live" to a CSV file you are not creating an extract, you are using MS Jet to query the CSV file in Tableau. When you create an extract, you are pulling the data out of the orignal data source, in this case a SCV file, and loading it into Tableau's custom made columnar data store.
Thanks for the help. Key here is to use servername and not DriveLetter:\\.
My problem was that both desktop/server/csv file are on the same machine and I didn't even think about addressing it using servername.
Joe, I am not understanding: I have a small txt dataset that is refreshed every 15 minutes from an application that is not Tableau. I thought that if I am doing a "live connect", I get live data when I open up Tableau. In my situation, I select a txt file when I connect; however, when I close Tablaeu and reopen, the directory is changed to: C:\WINDOWS\TEMP\698.tmp\Data\DATA 2\Myfile.txt. Why did it change? How can I point it back to the directory I set when I selected the first time?
did you save it as a packaged workbook (.twbx) or a just a normal workbook (.twb)?
a .twbx is all related files zip'ed into a single file, so in your case, this would be like a snapshot of the text file.
What if you edit the current connection so it points to the live file again, and then save as a .twb?
That was it. I saved it as .twb and it now works correctly. Thanks much!
We don't have a network share that Tableau Server can access. Are there other ways for Server to access the latest text file (.csv)? We refresh the .csv hourly to daily.