1 Reply Latest reply on Apr 20, 2017 11:57 PM by Galen Busch

    Similar text in different rows of a same column - duplicate records

    Victoria Fornaciari

      Hi,

      I want to know if it is possible to identify duplicate records in 1 column and algo to identify similar(but not same) content in more than one row in 1 column.

      Thanks!!!

      I want to create a scatter plot like this one, the color of the "X" depends on the duplicate type: Identical invoice reference - similar invoice reference - similar invoice descrption

      What would be the best way to create the data needed?

        • 1. Re: Similar text in different rows of a same column - duplicate records
          Galen Busch

          Hi Victoria,

           

          To identify an identical description, you can say:

           

          IF {FIXED [Invoice Number] : SUM([Number of Records])} > 2 then 'Identical" END

           

          When we get into 'similar' I'd need a bit more information. In terms of detecting patterns or matching certain letters/words, Tableau is not capable. We could do some hacky things like looking at the first letter of words, length of words, or comparing letters/words vs other letters/words (one to one) but detecting 'similarities' in text is not Tableau's purpose. There are other tools designed to do this.

           

          Galen