That's strange - have not seen it
Please post your Prep file so we can see the data and the duplication
It's sensitive data; I can't post the file.
Understand but it is pretty difficult providing any real help without visualizing the the problem
suggest you look at each step in the process to see where the added records are coming from
also see the link below on how to anonymize your data -
I understand. I have just isolated it to the 'Modify Values' step. There are two 'group and replace' actions that are causing the problem. Here are the transformations that are causing the issue:
- Relationship Field
- Original values 'Child' and 'Disabled Dependent' replaced by 'Dependent'.
- Original value 'Main Member' replaced by 'Member'
- Department Field [Note; I've anonymized this data but the patterns -- capitalization, number of letters/words, etc. -- are still valid
- Original values "XXXXX" and "YYYYY" replaced by "AAA"
- Original value "XXXXX XX XXXXX" replaced by "Bbbbb"
- Original values "XXXXXXXX XXXXXXXXX" and "XXX XXXXXXXXX" replaced by "CC/DD"
- Original value "XXXXXXX" replaced by "Xxxxxxx" (same value, just initial caps)
- Original value "YYYYYYY" relaced by "Yyyyyyy" (same value, just initial caps)
If I remove those two actions, the duplicates go away.
Does that help? (Appreciate your offer to assist!)
- Relationship Field
The problem is definitely caused by the last two changes in #2 above. In each of those cases, I am replacing a value in all caps with the same value in initial caps. In the output file, the duplicated rows are exactly the same and, for the Department field, it shows the initial caps value for both records.
That was how I had replaced the value "XXXXXXX" with "Xxxxxxx" and "YYYYYYY" with "Yyyyyyy". The problem is that Prep is duplicating all of the records that are affected by that change. In the output file, it is showing the new value for both of the records. I don't see any reason why Prep would create duplicates based on those edits, especially since the duplicated records are 100% identical.
I don't see the same thing
You are aware that Perp will sample data on large files - not necessarily every record as shown in my simple example?
If this posts assists in resolving the question, please mark it helpful or as the 'correct answer' if it resolves the question. This will help other users find the same answer/resolution. Thank you.
I believe the issue has to do with changing a value that is in all caps to the exact same value in initial caps (e.g., changing "GROUP" to "Group"). Sampling is not an issue. When I export the output to CSV and look at it in Excel, it's clear that every record affected by the change from all caps to initial caps is duplicated, and only those records are duplicated.
Well you can test that easy enough = just change them all to Group1 and see wht happens
Seems to be a bug. I've raised a support ticket.