3 Replies Latest reply on Jul 16, 2018 4:46 PM by swaroop.gantela

    Calculation to count duplicate values within a string

    Philippa Lehar

      I am looking to create a calculated field that counts the number of unique comma delimited values within a string.

       

      For example, of string: "New York, New Hampshire, Minnesota, Alabama, Colorado, New Hampshire, Alabama, Alabama, Florida" I would like to retrieve the result "6," since 6 unique cities are listed within the string.

       

      This calculation works well to count the overall character delimited values (in the example above, it gets me to "9") but I haven't figured out how to remove the 3 duplicate values to reduce the overall sum to 6:

       

      (LEN([City])-LEN(REPLACE([City],",",""))+1)

       

      Any suggestions on this would be greatly appreciated!

        • 1. Re: Calculation to count duplicate values within a string
          Deepak Rai

          Hi Philippa,

          Actually this is too easy in 10.2 and above, You just need to use a Cartesian Join on 1:1  and  a Count. So Get a List of Cities from Net , then do a Cartesian Join with your data and then use  a count to get this.

           

          Thanks

          Deepak

          If it Helps, Pl mark it Helpful and CORRECT to Close Thraed

          • 2. Re: Calculation to count duplicate values within a string
            Alex Kerin

            Deepak's solution works excellently. I wanted to see if this could be done without the city list being created, and without modification of the original file.

             

            This can be done in custom SQL or in my case Tableau Prep.

             

            Basically you take the source (Excel), split the city into separate city columns, pivot those together to get a column of just individual city names (but with lots of repetition), remove all but this column, use the aggregate grouping to just get distinct rows resulting in a list of cities, add an ID for the join.

             

            Join this in the same way as Deepak does with another copy of the file, add the ID, left join, use a similar calculation as Deepak to check for the city name, then aggregate again to the original rows and the countd of cities.

             

            This would fail with more than 9 cities (though that could be changed), and if you had two rows with the same original city string. This could be fixed with a unique ID.repeat.PNG

            • 3. Re: Calculation to count duplicate values within a string
              swaroop.gantela

              Philippa,

               

              You could also try a RegExp replace like this:

               

              REGEXP_REPLACE([Cities],'\b(\w+)\b,\s*(?=.*\b\1\b)',"")

               

              and then count the commas in the result of that.

               

              Although the above regexp doesn't quite correctly remove all

              the duplicate words (there is a residual "New"), it should remove

              all the commas associated with a duplicate, so the count should work.


              Please see workbook v10.3 attached in the Forum Thread.