2 Replies Latest reply on Aug 3, 2018 5:01 AM by Rory Wallis

    Tableau Prep Creating Duplicates When I Use The "Group and Replace" Function

    Rory Wallis

      Hi all,

       

      So I'm using Tableau prep to try to clean up a bunch of duplicates in my data. In short, we have a load of suppliers in our systems that are the same but are all named slightly differently (hooray for consistent data input!). I've been using "Group and Replace" to join together all relevant suppliers and then rename them accordingly.

       

      What I'm finding is that the "Group and Replace" function is creating duplicates of some of my rows. I've been experimenting with this for hours and can't pinpoint why it's happening. I have been using the function with other fields in the same flow without issue but when I try to sort out the supplier names this problem is popping up. I had a look through the forums/google and there seemed to be very little on this issue. All I found was someone suggesting that the problem was related to changing data from all caps to lowercase but I'm getting this problem even when the case is not being changed.

       

      Not a clue what to do about this so any help is appreciated!

       

      Rory

        • 1. Re: Tableau Prep Creating Duplicates When I Use The "Group and Replace" Function
          Anushka A

          Hi Rory,

          Sorry to hear you are running into this issue. Would you be able to send me the data for just the field you are grouping or screenshots so I can try to reproduce the issue either via email or through filing a support case?
          This sounds like a bug and we'd love to be able to investigate.Thanks,

          Anushka

          Senior Product Manager, Tableau Prep

          • 2. Re: Tableau Prep Creating Duplicates When I Use The "Group and Replace" Function
            Rory Wallis

            Hi Anushka,

             

            Thanks for getting back to me. I might have accidently stumbled on what the problem is whilst getting the data ready that you asked for. If I’m correct then it would definitely seem like a bug.

             

            It looks like the issue is related to the use of XLSX files. When I did a quick export of the “supplier” field as a CSV so I could try to recreate the problem for you, I found that the issue went away. I tested this further in my original flow and found that using CSVs as the source files instead of XLSX seems to be alright (although I haven’t finished working through it yet).

             

            I’ve attached a flow that directly compares cleaning the field using both a CSV and an XLSX as the source file. For the CSV I was able to use the group and replace function (manual) across the whole field (83 replacements) without affecting the number of rows. However, doing only a fraction of the grouping on the XLSX has increased the row count from 19,831 to  19,887 – that’s 56 rows added by replacing 21 values with 16 values in group and replace.

             

            I’ve attached the test flow and the field data as both a CSV and XLSX so you can test this further. Please note that my flow contains a number of joins that are purely there to check the exact number of rows of data in the field before and after the function was applied. Not sure if there is an easier way to check the row totals!


            Rory