2 Replies Latest reply on Oct 30, 2015 3:58 AM by Jason Strimpel

    Identify a sequence of duplicates

    Jason Strimpel



      My requirement is to identify a sequence of duplicate values that appear in sequence within a row of data. The sequence may appear anywhere within the rows (see screenshot). The number of values is a user-defined parameter. Finally, the count of duplicates found must be returned in a column.


      In the screenshot, I've identified two sequences of duplicates of length four. In this case, the user would have specified a sequence of four.


      If there are two sequences within a row, I need to return the length of the larger sequence.

        • 1. Re: Identify a sequence of duplicates
          Joe Oppelt

          I can see this turning into a long discussion that I don't have the bandwidth for...

          But the key to what you want to do is probably the LOOKUP( ..., -1) function to look back at the previous value.  If it's the same, make some kind of indication.


          It would look something like this, though you'll have to get the syntax right:


          if SUM([Sales]) = LOOKUP(SUM([Sales]),-1)  then PREVIOUS_VALUE(1)+1 else 1  end


          This will set the calc to 1 if not the same, or add 1 to whatever the previous value of this calc was.  So the first time you get a match, the calc will be set to 2.  If you get three in a row, the last one will be set to 3.  (And then if the next one after that is different, it gets set back to 1.)


          Now you can mess with rows where this number is greater than 1 and know you are playing with duplicates.  And you can do window_max(sum([running calc])) to find the largest one.  And other neat stuff.

          1 of 1 people found this helpful
          • 2. Re: Identify a sequence of duplicates
            Jason Strimpel

            Nailed it, thanks.