3 Replies Latest reply on Jun 11, 2018 9:20 AM by Andrew Bickert

    regex help

    stephen.cavallaro.0

      I am new to regex and deeply confused I need to parse 4 elements the following string: {"timestamp":{"value":1528602525201},"uuid":{"value":"6cb66f73-1519-4a2b-b5f1-de37ce02cca3"},"hostname":null,"money":null}

       

      1) 1528602525201, 2) 6cb66f73-1519-4a2b-b5f1-de37ce02cca3, 3) null, 4) null

       

      I thought REGEXP_EXTRACT([Header], '\d {5,}' ) would accomplish 1) but it does not

        • 1. Re: regex help
          Andrew Bickert

          Hi Stephen,

          Wrap your '\d in brackets and also put a parenthesis around your whole thing, like this:

          REGEXP_EXTRACT([HEADER],'([\d]{5,})')

           

          Is your string always going to be in that format?

           

          Andrew

          • 2. Re: regex help
            stephen.cavallaro.0

            yes the string will always match this format

            • 3. Re: regex help
              Andrew Bickert

              If it is always matching this format you could also consider using replace:

               

              REPLACE(REPLACE(REPLACE(REPLACE(REPLACE([Header],'{"timestamp":{"value":',""),'},"uuid":{"value":"',"|"),'"},"hostname":',"|"),',"money":',"|"),'}',"")

               

              Which would result in your values having a "|" between them, easily parsed out.

               

              If you want to use RegExp, you can use these.

               

              1st:

              REGEXP_EXTRACT([Header],'([\d]{5,})')

              2nd:

              REGEXP_EXTRACT([Header],'(([a-z0-9]){8}[-]{1}([a-z0-9]){4}[-]{1}([a-z0-9]){4}[-]{1}([a-z0-9]){4}[-]{1}([a-z0-9]){12})')

               

              3rd and 4th are a bit different since the value you are trying to extract out doesn't actually exist (unless they will always be null). In this case we can look for the values before them such as "hostname" and "money" and find the deliminator after null so we gather the values in between.

               

              3rd:

              LEFT(REPLACE(REGEXP_EXTRACT([Header],'("hostname":[a-z]*[,])'),'"hostname":',""),LEN(REPLACE(REGEXP_EXTRACT([Header],'("hostname":[a-z]*[,])'),'"hostname":',""))-1)

               

              4th:

              LEFT(REPLACE(REGEXP_EXTRACT([Header],'("money":[a-z]*[}])'),'"money":',""),LEN(REPLACE(REGEXP_EXTRACT([Header],'("money":[a-z]*[}])'),'"money":',""))-1)

               

              I have also attached a workbook with these examples for reference. If you need it in a different version let me know.


              Andrew