6 Replies Latest reply on May 10, 2018 11:19 AM by sai lakshmi

    Remove duplicates based on only 1 column

    Edward Ostrovsky

      My data is in the following format:

          rep_id  user_id  other non-duplicated data

          1          1            ...

          1          2            ...

          2          3            ...

          3          4            ...

          3          5            ...

       

      I am trying to achieve a column for `deduped_rep` with 0/1 such that only first rep id across the associated users has a 1 and rest have 0.

       

      Expected result:

          rep_id  user_id  deduped_rep

          1          1            1

          1          2            0

          2          3            1

          3          4            1

          3          5            0

       

      For reference, in Excel, I would use the following formula:

      IF(SUMPRODUCT(($A$2:$A2=A2)*($A$2:$A2=A2))>1,0,1)

       

      I know there is the FIXED() LoD calculation http://kb.tableau.com/articles/howto/removing-duplicate-data-with-lod-calculations, but I only see use cases of it deduplicating based on another column.

      However, mine are distinct.