You can do the following:
- run a test sql query in directly in database directly by joining all the tables needed and see the volume of data and the time it takes to run in database before using in tableau.
- there can be a case where indexes are not correctly defined in database on right columns and tables are huge with number of columns and volumes of data so i suggest run the query in database and see the explain plan and see the indexes and other stuff defined at db level
- create views in database and use the view in tableau,materialized views and other views instead of querying tables directly.
- use multiple common fields between tables to join on them to get accurate results in tableau
check this link Join Your Data
- you can use sql query on top of tableau extracts
- use DB filters and extract filters,check the order in which the filters are loaded in tableau,extract filters are loaded first then db filters.
it all depends on how are designing and fetching the data.
hope it helps.
By default I tend to advise that if the data is coming directly from a database connection then do joins, for everything else start with blends. Naturally this is just general advice.
Another thing to consider in between your two options is to just create some reporting views inside your database and then connect directly to that instead of the fact tables - this way you avoid having to do any blends or joins as the data you need will already be materialized inside the database. I'd preference this option because you can limit or extend the data that you release through this pipeline to suit the experience level of your users
Thanks for the advice. I think my overarching question is less about performance and more about the user experience. I guess in your experience, whether you are doing joins in the database via a view or in Tableau, how do you present multiple facts to a user when they share many, but not all of the dimensions? My concern is if, say, I join both orders and returned items to customer, location, product, etc, I'm fine as I can just show all of the dimensions and all of the measures on the one data source and we're good. But in this example, the returned items has it's own dimension for return reason, where do you put that if it's all joined in one data source so that the user doesn't try to report the order measures across the return reason and get tons of double counting?
Obviously in this example, it is intuitive that you would never trend your initial orders by return reason, but in doing BI for a long time with other tools, I have found that on more complex star schemas it is very easy to lose track of which things conform to which.
It would seem like having each fact in a "silo" in it's own data source would create a bit more of a logical separation to the user, but it sounds like that might not be the standard, so I'm just curious what other people do out there in this regard...