2 of 2 people found this helpful
Data Blending isn't really a join, but a blend. The closest analogy is a LEFT Join, but the results are aggregated (on the join fields you define) and then Joined on those fields. by way of example if I had the following 2 tables
Product ID Sales 1 100 2 200 3
Product ID Depot Stock 1 North 100 1 South 100 2 North 100 2 South 100 2 East 100 3 North 100 3 South 100 3 West 100
If I Blend the Secondary Data Source on Product ID
The Stock Figures Brought in would be
Product ID Sales Stock 1 100 200 2 200 300 3 150 300
Hope this makes sense, but if you can post a use-case I can (hopefully!) explain how the Blend is working for that particular instance
1 of 1 people found this helpful
In traditional relational database analysis, joining tables from the same database is a common practice. However, when data needs to be combined from different databases, complications often arise. As most database vendors don’t provide easy methods of combining data from other vendors (if they provide it at all), this task is often left to the analysis tool itself. Tableau’s approach to this predicament is called data blending. Data blending provides a way to combine data from more than one data source on a single worksheet. In short, Tableau connects to multiple data sources, sends independent queries to those data sources, and then combines (or “blends”) the aggregated results of the independent queries on a single worksheet.
For each data source that is used on the sheet, a query is sent to the database and the results are processed. Then all the results are left joined on the common dimensions. The join is done on the member aliases of the common dimensions so if the underlying values aren’t an exact match, you can fix it up in Tableau.
Matt Lutton also described this terminology very beautifully in one of his post "Data Blending is a Variation on a left join, but it occurs after aggregation, rather than at a row level. There is no way to change the join type in a data blend, and its really not a join at all, but a variation on a join that occurs after aggregations."
Rajeev gives a great explanation of this and his reference to Matt's post is spot on.
It is my recommendation that if you need to perform more in depth joins from disparate databases, that maybe you enhance your Tableau implementation with something like Alteryx. I have found that Alteryx provides me all the flexibility I need for massaging my dataset(s) while allowing me to use the power of Tableau to visualize it.
That being said, Alteryx is no cheap product to add into the mix.
As others have stated above....
Data Blending is like a Left Join, but on aggregated results. With Blending, you can mesh data from various sources.
When possible, it is always better to shape your data before it ever gets to Tableau. Using ETL tools lets you bring in all of the this data from various sources, into one unified location. Always remember Tableau is meant to be your Data Visualization tool, not your ETL tool.
Blending is great. But it can really limit what you can do. Also (this is for another discussion) can create a lot of Domain Padding issues, which, at times, can be very complicating.
Ohh thank guys, it really helped me know i am clear withd Data Blending... thank you for the responce.
Your reply was very helpful and clear. And reply from other members also helped me understand the reason behind. However this leads me to another doubt regarding blending. Sorry to bother you with a rather long query. But your answer would benefit me a lot.
I understand that one of the area where blending is suggested is with disparate data sources where joins are not possible.
1) But if blend would only join on aggregated data and not on row level, should i understand that "Depot" data from secondary source (in your example) cannot be seen in the final blend ? i.e, if i want to see which "Depot" have contributed to the "Sales" volume 200 for the "Product ID" - 2 , will the data blend be able to help me ?
2) If it wouldn't be available then am i right in assuming that only the "Measures" in secondary data sources are aggregated and joined to the primary based on the common dimensions. And the dimension columns in the Secondary source that are not present in Primary are lost in the blend.
Request other board members as well to help me with their inputs.
1 of 1 people found this helpful
So a good question! Firstly with regards the use of Blending. When I wrote this we didn't have cross-database/source joins, so if 2 tables resided in 2 different places (say a SQL Database and an Excel Spreadsheet) blending was the only (using only Tableau) way to have information from both tables in the same model. With the addition of cross-datasource joins we can now join the tables properly into a single datasource. So the main use case, these days, is where the granularity of the data in the 2 tables (as per my example here) is different. In my example the Sales table is at Product level, and the Stock Table at Product/Depot level.
Let's start with question 2! Yes all measures brought into the secondary datasource are aggregated (in fact if you try and bring a dimension from the secondary, it gets wraps it in at ATTR, so is also an aggregate).
On the first question. If we think about the real world situation for this example (retail and supply chain is my stock and trade!)...
The Sales table gives us Sales by Product, but underneath this (but not in the data) is that this total sales for a product is built up from adding up all the sales for each store. So we could envisage a table like this (only we can't see or have access to this lower level table)
Product ID Store Sales 1 A 60 1 B 40 2 A 100 2 B 75 2 C 25 3 A 25 3 B 75 3 D 50
Now image Store A is supplied from the North Depot, Store B from the South Depot, Store C from the East Depot and Store D from the West Depot
If we had this we could attribute the Sales coming from Stores supplied by Depot X
So for example; for Product 1 and North Depot the Sales (from Stores Supplied by that Depot) are actually 60, not 100. However as we don't know this from our Product Level table (we only know the sales from all stores is 100), we have no way of getting back to the 60 number. So bringing Depot to the Product Sales data doesn't really make any sense. How would Tableau know how to attribute the sales to a Depot? Another way of putting this is that you can't increase the granularity of the primary data-source, by using dimensions in the secondary.
So with regards to how you could answer your first question...you could start with the Depot/Stock table as the primary, so you have the level of Product/Depot. You could then bring in the Sales (from the Sales, secondary datasource), which would put 100, against both Depots which supply Product 1, say. You could then (using an LoD) work out the proportion of stock in each depot (in my unrealistic example all depots carry the same amount of stock), which would be 50% and 50%...and then multiple the sales by that, so it would allocate 50 Sales per Depot, and you still have the correct sum of 100 units.
btw I wouldn't advise doing this in the real world of Depots and Stores as depots are likely to be on different order cycles, so while one depot may have 200 units, say and another 100...it might be that the second depot is at the end of it's order cycle and is about to receive another 300 units, whereas Depot 1 is just at the start so 200 units is the most it ever carries. This might not be your line of work, but just wanted to be sure I wasn't giving any poor advice!
Hope that helps?
Thank you for a detailed explanation. That did help me clear my doubts.
And thanks for correcting my idea that blending is still an use case mainly while using disparate data sources. It helped as well.