Skip navigation

Prep: In the Join step Group and Replace for all instances of a value in a multi-key join (not just a single tuple at a time)

score 4
You have not voted. Active

Problem: Tableau Prep's Join step let's me manually update join keys on the fly. This is great! When I have multiple join keys, though, when I update a join key that only updates the value for that tuple, not for all instances of that value, and this makes correcting data for joins more work than it needs to be.


Here's an example: I've got two data sets with facility and year. The data is sparse on both sides (not all facility-year combinations exist in each data set), and the names are different. In this case we've got "blue" in one data set and "azure" in the other that are actually the same facility:


Screen Shot 2019-09-30 at 10.51.42 AM.png


If I'm just doing a join on Facility then I can make the correction and it works for all rows:


Screen Shot 2019-09-30 at 11.03.52 AM.png


Now if I don't do that, but instead directly join on Facility & Year (and in fact with this data set Prep prompts me to join on the Year first, thus setting up this particular scenario) then I see a list of all unmatched tuples:


Screen Shot 2019-09-30 at 11.05.21 AM.png


This is what I want to see, so that's not the issue. The issue is that if I change one of the "azure" facilities to "blue" then my expectation is that the correction would change *all* of the values of "azure", and it only changes the one for that Facility/Year tuple:


Screen Shot 2019-09-30 at 10.52.24 AM.png


Now imagine instead of just a few rows of data there were 48 months' worth of data for each facility. That would require making 48 changes for every single facility that didn't match. Or doing some sort of workaround where the user would only join on the Facility, correct all the mismatched facilities, then remove that join clause and then apply the next join clause and correct all the mismatches there, then repeat for each additional join clause, and *then* after that add back in all the other join clauses to actually build the join. Or somehow the user would need to use one or more Clean steps in advance of the join and do a lot of clicking back & forth to correct values. None of these are really in the spirit of Prep's goal to enable users to do cleansing in place.


Therefore the feature request is to enable users to change a given value in the Join step and propagate that change to all instances of that value for the field, not just one tuple at a time.


Vote history