4 Replies Latest reply on May 2, 2018 8:45 AM by Jason Scarlett

    How to extract each number between ( ) in string using REGEXP_EXTRACT_NTH?

    rowe.palmer

      I would like to use REGEXP_EXTRACT_NTH (or better method) to obtain every number that appears within brackets in a string.

       

      E.g.

      "I require an expression which can find numbers (10) throughout a string (11) at any location (12)." >> 10, 11, 12

      "It must skip numbers 32 that are not within (4) brackets in a string 2 with many (99) possible other characters." >> 4, 99

      "(21)(33)" >> 21, 33

      "Plain string" >>

       

      I'm new to regular expressions but so far I have managed to find the first occurrence using:

      REGEXP_EXTRACT_NTH([String],"\((\d*?)\)",1)

      but I would like to find the nth occurrence and the above returns NULL for anything above 1

      REGEXP_EXTRACT_NTH([String], <expresion>, nth)

       

      I've tried variants of patterns for before and after but cannot seem to hit the mark.

      Any assistance would be greatly appreciated.

      Thanks.

      Rowe

        • 1. Re: How to extract each number between ( ) in string using REGEXP_EXTRACT_NTH?
          rowe.palmer

          Mission 2 is to obtain the content before each number up to nth occurrence to be associated with the number extracted above.

           

          E.g.

          "I require an expression which can find numbers (10) throughout a string (11) at any location (12)." >>

          "I require an expression which can find numbers", "throughout a string", "at any location", "."

           

          "It must skip numbers 32 that are not within (4) brackets in a string 2 with many (99) possible other characters." >>

          "It must skip numbers 32 that are not within", "brackets in a string 2 with many", "possible other characters."

          • 2. Re: How to extract each number between ( ) in string using REGEXP_EXTRACT_NTH?
            Jeff D

            Hi Rowe,

             

            Is it possible to pre-process the data before loading it into Tableau?

             

            It's not so easy to achieve this transformation with string calculations.  You could use a series of nested REGEXP_REPLACE calls to remove the unwanted characters (as compared to your original idea of extracting the wanted characters), but it gets messy, and covering all the edge cases can be tricky.

            1 of 1 people found this helpful
            • 3. Re: How to extract each number between ( ) in string using REGEXP_EXTRACT_NTH?
              rowe.palmer

              Hi Jeff,

               

              Thanks for your response. Yes it is possible to pre-process the data, especially if I'm using a static data snapshot. ETL layer changes would be required for an ongoing automated solution for the production database. I was hoping to at least simulate what the improvement at the database side would look like in terms of data resolution and analytics potential on live data, for the uninitiated. Conceptually simple perhaps not Tableau simple.

               

              The context if it assists is a log of equipment downtime events that are currently recorded as a series of unstructured descriptions with manually recorded downtimes (inserted into brackets) and stored as a single entry in the source database. Presumably a hangover from when the logs were hand written and were used for context not analytics.

               

              I'll go the pre-processing option as suggested.

              Thanks again.

              • 4. Re: How to extract each number between ( ) in string using REGEXP_EXTRACT_NTH?
                Jason Scarlett

                I've had a similar issue of finding the Nth result. In my case, I repeated the search string 4 times and it seemed to work.

                 

                Example:

                REGEXP_EXTRACT_NTH(

                [Sampletext]

                ,'(\b.{1,10})(\b.{1,10})(\b.{1,10})(\b.{1,10})'

                ,4)

                 

                Note how I repeated the capture pattern  (\b.{1,10})  four times.