7 Replies Latest reply on Nov 26, 2018 7:34 AM by Joe Oppelt

    Find Duplicate Multiple IDs

    Thiago Alpoin

      Hi Everyone,

       

      Here's my issue:

       

      I have two columns IDs both represent the same product. I would like to find all the 'duplicate' IDs (looking in both columns) and set them in the same bucket (maybe creating a new column).

       

      Attached a worksheet.

       

      Below a graphic representation of what currently is and what I would like to accomplish (manually modified in excel).

       

      Current:

       

      Archive ID 2752305 has two IDs Name (2752305, 309779339). However, the ID Name shows up again in another Archive ID.

      2017-10-23_12-24-38.png

      Goal:

       

      A new column C could be created where it would replicate the Archive ID from a replicated ID_Name

       

      2017-10-23_12-39-43.png

       

      Main Goal:

       

      Once in buckets, the amount was counted 2 or 3 times for the same product. If the product counted more than once, then keep only only - preferable by any Type ID except 15. If there is no NEW COLUMN duplicate, leave as is.

       

      Thank you so much in advance for your help on this.

        • 1. Re: Find Duplicate Multiple IDs
          Joe Oppelt

          Well, here's a start:

           

          You can tell if any [ID Name] is hooked to more than one [Archive ID] as shown in the attached.  (see [Duplicate ID Name?].)

           

          I'm just not sure what you want to do with this.  What if there are 50 [Archive ID]s that some [ID Name] is hooked to?

          • 2. Re: Find Duplicate Multiple IDs
            Thiago Alpoin

            Thank you Joe Oppelt

             

            That's indeed a great start. I didn't think of CountD in a LOD calculation. That's certainly identify the duplicates in the 2nd column.

             

            The highlighted rows represent the same product. Once identified they represented the same product (maybe creating an extra column as exemplified above as Column C), I would like to eliminate the duplicate values leaving only one unique 'Product - Column C'.

             

            If there are 50 Archive IDs that some [ID Name] is hooked to... goal is to eliminate 49 and leave only 1.

             

            2017-10-23_13-34-45.png

             

            Thoughts?

            • 3. Re: Find Duplicate Multiple IDs
              Joe Oppelt

              And in the attached I did three treatments of [NEW COLUMN].  All 3 grabs the lowest value of [Archive ID] among those that have [Archive Type] <> 15. The first just lets the value go NULL if there is no [Archive Type] <> 15.  The second forces a bogus number in there.  the third grabs the MIN [Archive ID] among those that are Type = 15.

               

              It just shows some stuff you can do with the [Duplicate...] calc I made.

              1 of 1 people found this helpful
              • 4. Re: Find Duplicate Multiple IDs
                Thiago Alpoin

                That's fantastic Joe Oppelt ! Thank you so much

                • 5. Re: Find Duplicate Multiple IDs
                  Thiago Alpoin

                  Hi Joe Oppelt,

                   

                  I have a 'continuation' question with the same data. I create a new discussion, so it's not overwhelmed in here. Because you had a chance to see my data, would you mind to see if you could help me out on my other question?

                   

                  Grand Total Does NOT Update With Parameter Top Filter

                   

                  Thank you in advance for any hint you could give me.

                  • 6. Re: Find Duplicate Multiple IDs
                    philip.watkinson

                    Greetings from San Diego Joe,

                     

                    I have a similar but more complex challenge. We have PATIENT_REGISTER table (which for obvious reasons I am not able to share), and we need to discover/ report likely patient account duplicates. Beyond the most obvious fields of Name, Account Id, and Date of Birth, I am having to bring in other unlikely dimensions and/or calculated fields to help with a positive identification such as: "Pharmacy Key,"  "Age as of Today," "Street Address" (1 and 2), and so on. The SSN is not required, and most patients do not submit it. To make things even more interesting, we have unruly name data capture with any number of clerical errors. Even so, sorting on name is probably the best way so far to reveal duplicate accounts. My question is, how can I bring several or all of these fields into a formula to help me see the duplicate accounts?

                    • 7. Re: Find Duplicate Multiple IDs
                      Joe Oppelt

                      Actually, Philip, you're probably going to have to share something.  So let me first direct you to the video in this link:

                       

                      Video demonstrates how to anonymize your workbook/data

                       

                      Get a data set with enough examples to show some of the trickier duplicates that you would need to find.  You don't need all the medical records of the accounts you include.  Just hack up some difficult/complicated examples of what should be considered duplicates.

                       

                      And you really should start a new thread with this question.  At this point I'm probably the only one who is seeing this old thread.  A new thread will attract multiple people with multiple ideas.