Sure! There are a couple of possible ways and which you choose may depend on exactly which record you want. Let's talk about a couple of cases.
Case 1: Exact Duplicates - All values for duplicate records are the same
Employee ID Name Date Hired 1 Pentecost 4/24/2012 2 Ryrie 3/2/2015 3 Walvoord 5/1/1997 3 Walvoord 5/1/1997 4 Chafer 2/27/1982
Here, you can just use an Aggregate step and Group on Employee ID while Aggregating all other fields as MIN or MAX:
The output is nicely deduped, because the aggregation only gives you one record per Employee ID:
Case 2: Duplicate IDs with various values and you want to keep a specific row
Employee ID Name Date Hired Status Updated Status 1 Pentecost 4/24/2012 4/12/2018 Full Time 2 Ryrie 3/2/2015 4/12/2018 Full Time 3 Walvoord 5/1/1997 4/11/2018 Part Time 3 Walvoord 5/1/1997 4/12/2018 Full Time 4 Chafer 2/27/1982 4/12/2018 Full Time
Here, the duplicate occurs because the second row indicates an update of status. Likely you'll want to get the most recent record. At times there may be other logic, but it will likely follow the same pattern (see details here https://vizpainter.com/latest-snapshot-in-tableau-and-maestro)
Basically, you'll use the aggregation step to group on Employee ID and get the MAX update date (or Row ID or other field or MIN of the field that identifies the row you want to keep). You won't worry about other fields yet.
Then, you can inner join that back to the previous step in the flow on both the Employee ID and the Status Updated fields.
It looks a bit strange, but you can see that the row on 4/11/2018 is excluded by the join. The net result is a nicely deduped data set:
I've attached a sample packaged flow with these examples.
Hope that helps!
Deduping.tflx.zip 2.8 KB
This is a great tip that will come in handy for a lot of people! A slight variant of case 1 would be when you have multiple records per ID, and some are exact duplicates. If your goal is to remove ONLY the exact duplicate rows, you can group by ALL fields. In the case of the sample file, either approach works.