4 Replies Latest reply on Jan 16, 2018 10:56 AM by Hari Ankem

    Parsing specific Text from a Complicated String

    Eugen Zglimbea

      I have a field in my data that looks like the following.

       

      @<TAG1>?<TAG1_TEXT>@<TAG2>?<TAG2_TEXT>@<TAG3>?<TAG3_TEXT>@<TAG4>?<TAG4_TEXT>@<TAG5>?<TAG5_TEXT>

       

      I am trying to extract the text after each tag in the string into their own dimension in my workbook.  I know that I have to use some sort of combination of a CASE or IF statement with some sort of REGEXP function when creating the calculated field, but I am not able to find the correct combination.

       

      Each tag in the string begins with a '@' symbol and ends with a '?' symbol.

       

      Another thing to note here is that the tags may not always appear in the same order for each row in my dataset.  For example, in some rows in my dataset, TAG4 appears at the beginning of the string.

       

      Ex: @<TAG4>?<TAG4_TEXT>@<TAG1>?<TAG1_TEXT>@<TAG2>?<TAG2_TEXT>@<TAG3>?<TAG3_TEXT>@<TAG5>?<TAG5_TEXT>

       

      In addition, TAG5 could be either "User Name" or "Username" for each row in my dataset.

       

      Any help here would be greatly appreciated!

       

      Message was edited by: Eugen Zglimbea

        • 1. Re: Parsing specific Text from a Complicated String
          Hari Ankem

          See if this helps:

          Capture.PNG

           

          I am assuming that the tag names do not change. The values can change, but should not include a ">" as part of the value.

          Capture.PNG

          Capture.PNG

          Capture.PNG

           

          You can create additional calculated fields as required.

          1 of 1 people found this helpful
          • 2. Re: Parsing specific Text from a Complicated String
            Eugen Zglimbea

            Hari,

            First of all, thank you for your help here.  I tried your suggestion, but it only seems to be working for maybe two of my tags.  The nature of the data in the dataset makes it so that some of the other tags come out as garbage, no matter what I try.  I've included a sample of data from my dataset to show you the trouble.  Any further help would be greatly appreciated.

             

            As you can see, the order of the tags in each row is not predictable.  In addition, the '@User Name?' and '@Username?' tags are meant to be the same thing.

            • 3. Re: Parsing specific Text from a Complicated String
              Peter Fakan

              Hi Eugen,

               

              I think your initial assumption is correct, you are either going to have to regex your data, or find another method of parsing the data before you load it into Tableau.

               

              This link will help you find the correct combination - RegexOne - Learn Regular Expressions - Lesson 1: An Introduction, and the ABCs 

               

              HTH

               

              Peter

              • 4. Re: Parsing specific Text from a Complicated String
                Hari Ankem

                OK. I have modified the formulas based on the sample data, and here is the output:

                Capture.PNG

                 

                For any further changes and/or additional tags, you can replicate the below formulas by changing the values in red as shown below.

                 

                Tag 1 Value:

                IFNULL(

                MID([Tags],

                    IF FIND([Tags],"@Tag1?")>0 THEN FIND([Tags],"@Tag1?")+6 END,

                    FIND([Tags],"@",FIND([Tags],"@Tag1?")+6) - (FIND([Tags],"@Tag1?")+6)

                    ),

                "")

                 

                Tag 2 Value:

                IFNULL(

                MID([Tags],

                    IF FIND([Tags],"@Tag Two?")>0 THEN FIND([Tags],"@Tag Two?")+9 END,

                    FIND([Tags],"@",FIND([Tags],"@Tag Two?")+9) - (FIND([Tags],"@Tag Two?")+9)

                    ),

                "")

                 

                Tag 3 Value:

                IFNULL(

                MID([Tags],

                    IF FIND([Tags],"@Tag3?")>0 THEN FIND([Tags],"@Tag3?")+6 END,

                    FIND([Tags],"@",FIND([Tags],"@Tag3?")+6) - (FIND([Tags],"@Tag3?")+6)

                    ),

                "")

                 

                Tag 4 Value:

                IFNULL(

                MID([Tags],

                    IF FIND([Tags],"@Tag Four?")>0 THEN FIND([Tags],"@Tag Four?")+10 END,

                    FIND([Tags],"@",FIND([Tags],"@Tag Four?")+10) - (FIND([Tags],"@Tag Four?")+10)

                    ),

                "")

                 

                Hope this helps. The workbook is attached.