9 Replies Latest reply on Jul 4, 2012 7:45 PM by Joe Mako

    Data blending – problems with uneven data pairs

    . Matthew


      Let me start by saying that data blending in 6.0 is fantastic and I am finding it incredibly useful.  However, while working with it I have managed to produce some unwanted data inconsistencies.


      I have attached a simplified example (Data Blending - problems with pairs.twbx) that shows unless you’re careful you can produce quite different results from a set of data just depending on where in Tableau the data is drawn from.


      The example shows 8 different production centres dealing with 3 different suppliers.  One data set called “Plan” represents the forecast and the other called “Usage” represents the actual result.


      The first couple of worksheets show the original data for reference only.  The third worksheet starts with the actual data from “Usage” blended together with the forecast data from “Plan”.  The fourth worksheet starts with the forecast data from “Plan” and blends it with the actual data from “Usage”. In other words worksheets 3 and 4 are viewing exactly the same data but each time from a different position.


      You would think that worksheets 3 and 4 should produce exactly the same results but they don’t.  The reason being is that the pairings of “Production Centre” and “Supplier” do not exist evenly in each data set.  This is caused when a production centre uses units from a supplier that weren’t included in the forecast, or alternatively where it fails to use the units from a supplier that were included in the forecast.


      Both of these cases are very possible in the real world and looking at the original data it isn’t obvious that you could get these sorts of problem in blending such well matched sources.  In this small example, it is pretty easy to spot the problem lies with data from the Russia and Thailand production centres but normally you’d only look to blend data in one direction and so it can be much tougher to realise there is a problem and you can be serving out bad data.


      The only and horrendously inelegant way I have found to overcome this so far is by ensuring that the pairs exist in each data set by inserting blank data as the 2nd example shows (Data Blending - pairs kludged) but this is not a realistically usable fix.


      If anyone has other and better ways to get around this problem, I would be very interested to hear their ideas.