In general, a join will be faster than a data blend so I'd suggest going with the join. In the connection editor you can choose to hide fields from any of the tables and thereby remove duplicates.
Thanks for your response. Will it not duplicate the data?
Product Customer Sales
P1 C1 200USD
P2 C2 100USD
Product Customer Discount
P1 C1 50USD
P2 C2 10USD
Now, can I join Fact 1 with Product and Customer Dimension and also Fact 2 with Product and Customer Dimension and report the right Sales and Discount figures?
Assuming that all you have in your fact tables is those fields, then yes, you won't be duplicating data. If you have other fields (such as purchase date, etc.) then those would also need to be join keys to prevent unwanted duplication. There are situations where adding too many join keys causes undesired performance, that's one of the exceptions to the "in general, a join will be faster than a blend" statement I made in my prior post.
I have multiple fields in the fact tables, like surrogate keys which would be joined to the respective dimension tables on the key column. I think I have around 8 dimension tables, hence 8 such columns in the fact tables. These key columns remain the same in both the fact tables.
Will it work in this case?
If the data has the granularity that you've said it does and you are able to join across all of the dimensions, then yes, it will work (subject to my caveat around performance). However, since you seem to have an incomplete understanding of joining across tables (the questions you ask are not unique to Tableau, and instead normal for any relational database), I suggest you do thorough testing to be sure that it is working as desired.
Thanks Jonathan Drummey,
I shall try this and do a thorough testing. Also, could you please help me in understanding which approach works best?
a. Having one single fact table joined to all the dimensions in Tableau.
b. Have one single flat table containing both dimensions and facts
Like many things in Tableau, the answer is, "it depends." What is "best" depends on a number of factors, including (but not limited to):
- volumes of data
- speed of sources
- relative sparseness of data
- freshness of data
- live connection vs. extract
- granularity of the data (particularly when facts/measures are at different granularities)
- granularity of the analysis/es
- interactivity & view load/refresh time requirements
The easiest thing for Tableau to work with is a data source that is flat, but that doesn't mean creating a single flat table. For example, when using a live multiple tables connection to a source with referential integrity Tableau can use join culling to only query what is necessary, so a single fact table joined to the dimension tables could work. A Tableau data extract essentially creates a flat table that is highly optimized for the kinds of queries Tableau makes.
However, there are dependencies on the structure of the data and the analyses you are performing, here are some examples:
- Doing a ton of joins in a live connection when there is high cardinality (lots of distinct values to join) might end up being too slow, so doing some pre-computation to flatten out the source might be better.
- If you have budget data at a coarser granularity than detailed expenditures or a similar situation, the work to make the two different granularities line up in Tableau might be more easily/quickly done in the data source.
- The major granularity of the analysis might be at some high-level roll-up, so creating a (drastically smaller) aggregated source might be worthwhile to keep that fast, then have a second (larger and slower) source for drill-down into the details.
- A daily executive dashboard might demand a load time of 5 seconds to display and therefore require more effort in the data source, while a monthly report might be ok taking several minutes.
I'm sorry I can't give you a single "best" answer, the use cases to which we put Tableau are so varied that there's only a best answer for each distinct situation. Given a few sentences of description on the forums, all I can provide are some guidelines.
Thanks a lot for your inputs, this is really helpful.
I agree that the best answer varies for different use cases. I am looking to see what works the best for you, I am currently using a single fact table connected to multiple dimensions to use join culling.
Also, I am thinking to explore creating a single flat table to simply remove any scope of joins to make it even more faster.
Just to give you a background, we had initially developed the dashboards on SQL Server Analysis cube, however we faced lot of performance issues. I had tried simplifying the dashboards and was able to bring the time down but still we have one dashboard which takes around 50 seconds to load and around 80 seconds for an action filter. Therefore, we thought lets try connecting to the data warehouse directly to see the performance and we are just looking for the best way to get speedy dashboards. I know extracts would be the best option to get the desired performance results. But we also need to consider minimal changes to our current architecture and maintenance strategy.
I totally understand the need to balance between costs of maintenance, changes, load times, etc. At a higher level, I've come to frame this as a problem of design needs. Very often we are working with systems (software, data structures, processes & procedures, etc.) that were designed for operational needs, marketing needs, legal needs, billing needs, but not necessarily the requirements of analytics. In my case, I work in a healthcare system with software & processes that are designed for clinicians and for billing, but not for what I need for quality improvement. So I spend a majority of my time researching & wrangling data.
You asked what works best for me, see I Have Wee Data – Microsoft Access and Tableau | Drawing with Numbers for details. I generally use queries to create flat data sources and then use extracts, but that's due to the particular limitations of my environment.
I know this is a year old but maybe others still reference this post?
I am assuming that you have more than one fact table because of different grains. Unless Tableau has figured out how to write coalesce queries to write separate queries for each fact and then join the results (which avoids duplication), the answer is no. Some try to address it with data blending but that isn't what it was designed for (and it is typically slow). In my opinion, the best approach is to denormalize (flatten out) the data before you introduce it to Tableau (which is the conclusion you came to.)
Otherwise you can union all the data in a single table with a lot of columns and then load it into an extract (The same way you do it when you use QlikView or Qlik Sense)