There are a few possibilities:
1. Stick with the same filename month after month.
That is, literally replace the file with the new file of the exact same name (archiving the old ones is probably a good practice). The first month you have a file named data.txt. The next month you replace it with the new file also named data.txt.
Pro: all you have to do is just open the Tableau Prep, run the flow it and it should work fine.
Con: it's a bit harder to keep track of the files ("did I replace it already?")
2. Add the new file to the directory remove the old file (or move to archive directory)
Tableau Prep will detect that the source file is no longer available and prompt you to locate it. Point Tableau Prep to the new file instead.
Pro: fairly easy
Con: none really, but a bit more manual
3. Add the new file to the directory and swap out the input node
Here, you'll add the new connection and replace the input node with the new file.
Pro: There's no question in your mind what you've done and the input node keeps the name of the file so the flow is showing you exactly which month/file you've used.
Con: Definitely a bit more involved than other options.
4. Just keep adding files to the directory, union them all together, and then filter the File Paths field to keep only the newest
Pro: You can compare new data to previous months if needed because it's all there in the flow
Con: Definitely manual; higher potential for human error; potential performance issues depending on size and number of files
With any of these approaches the assumption would be that all the fields (columns) in each new file stay the same. If they change, you could run into the following:
- Newly Added column - no problem. it will just show up in the flow
- Changed name - could break parts of the flow downstream. You'll have to fix it (potentially by just changing it's name back to the original in the input node)
- Removed column - could break parts of the flow downstream. You'll have to decide how to handle this one. Might need to add a calculated field of same name and type to fill in values for the flow
The order of the columns shouldn't matter, unless you have specified that the first row does not contain column names and thus Prep is naming the columns automatically based on position. In that case, the file would need to maintain the same order of columns month after month.
Hope this helps!
A few final thoughts:
- In terms of which I would suggest - I would lean towards the first option (and have done it myself).
- I would probably not use option #3 because it's the most difficult, time consuming, and eventually you get a lot of files (both in the directory and the connection list) that you have to sort through.
- Option 4 could be extended a bit if the name of the file or something in the file (such as a date or timestamp) could reliably be used to identify the latest file, because in that case you automate the flow by using an aggregate to find the MAX filename or timestamp and filter to only the records for that file. In some circumstances, I would lean that way because it would have the best of all words.
Above and beyond the call of duty! Your responses are very helpful. I like option 1 and 2. Option 2 seems interesting because I can date the files before I repoint Prep which allows me to verify that I have indeed updated the flow and am using the most recent file. The only issue I see with #1 is that if I use the same file name every week, my paranoia will kick in and I will second guess whether I uploaded the newest file! However, I really like #1 as it is less manual. Closing thoughts?
Jonathan S. Tunner
411 Libbie Avenue, Suite 2
Richmond, VA 23226
I'm glad that helped! I agree with your thoughts on #1 - however, it is the one I actually use most often. It seems like sometimes with #1 I have to click the refresh button in Tableau Prep and then I wonder if it was just because I hadn't replaced the file or if it didn't pick up the changes. So there is a bit of second guessing and checking sometimes.
Best of Luck!