I am starting a new project that will require pulling data from a number of SQL Server tables, combining it with data from an Excel file, then publishing out to Tableau Public.
The data I need from SQL requires a union plus many joins, all from source tables with many fields. I know of a couple of different ways I could approach the data pull, but I'm not experienced enough with Tableau to know what the differences in performance, or ease of maintaining the workbooks and refreshing the data, might look like for the final product. (I will probably not be the one maintaining this project in the long term, so it needs to be simple enough to pass on to someone who knows less than me. And I don't know very much!)
Another related (but also very important!) question I have is this: will the data extract in the packaged workbook for all the below options look the same? Some of the source tables have data that absolutely cannot end up in the extract.
These are the options I'm considering:
1) Use initial SQL to create temp tables, which I then join in the GUI. This seems to work fine, except that refreshing the extract seems clunky. After I create the temp tables, I switch the database dropdown to tempdb to access them. Then, if I need to refresh the data, I have to switch back to the source database, then refresh, then switch back to tempdb again. It makes sense logically that it needs to work this way, but it makes me worry that the data refresh process might be clunky and annoying down the line, especially considering that I will need to bring in excel data as well, which I assume will create an additional step.
2) Use custom SQL to create a single table (do the union in the custom SQL). This works fine too, but I think it might be a little slower....but it does eliminate the need to toggle back and forth between databases if I need to refresh. This is the approach I'm favoring right now....
3) I guess I should add the option of just doing all the joins and unions right there in Tableau, but it seems to me that it would be a big mess since each of the source tables has so many fields I don't need to use and cleaning them all up sounds like a lot of busywork. But might it run faster?
4) Technically, I could also create a single source table in SQL Server, but then refreshing that table would add an extra maintenance step, which I'd like to avoid.
Can anyone give me some best-practices advice on this kind of project?