6 Replies Latest reply on Oct 3, 2018 9:42 AM by swaroop.gantela

    Regular Expressions with Live Data Source

    Rohan Thakral

      Hi All,

       

      I have a very specific requirement which requires the use of Regular Expressions. I have to apply some Regular Expressions and filter my results based on that and publish the workbook on Tableau Server. But since Regular Expressions are not supported with Live Data Source, Is anyone aware of an alternative or workaround for the same? To provide more details below are my requirements :

      • From a field which contains both characters and digits, my first step is to remove all characters and keep only digits. Since the characters can be present at any position, I don't have any other way to get this done except from Regexp_Replace. Example of Data in the field : a71l2bd72nf82nnnd519dn190.
      • After I remove all characters and keep only the digits from this field, my next step is to keep only those rows where the number start from particular 2 digits and is followed by 13 additional digits which is again only possible with Regular Expressions.

       

      Please share your thoughts if anyone knows a workaround which we can use to make Regular Expressions work with Live Data Source. Also, if we create the workbook with an extract and replace extract with Live Data Source before we publish the workbook, will it work?

       

      Thanks in advance!!

        • 1. Re: Regular Expressions with Live Data Source
          swaroop.gantela

          Rohan,

           

          These are inelegant, brute force methods, but maybe it can give ideas.

           

          One way is to check every character:

          IF (LEFT([Test Field],1))>'9' THEN "" ELSE LEFT([Test Field],1) END

          +

          IF MID([Test Field],2,1)>'9' THEN "" ELSE MID([Test Field],2,1) END

          ... etc.

           

          Likewise you can tailor the format check:

          (LEFT([Test Field],1))<='9'

          AND

          MID([Test Field],2,1)<='9'

          AND ... etc.

           

          This will likely be highly non-performant for you true dataset.

           

          A few questions:

          -what is your datasource type?

          -how many rows of data are there?

          -is your text field always the same length?

          -are the alphabetic characters limited to small subset:

            if so can try:

          REPLACE(

          REPLACE(

          REPLACE(

          REPLACE(

          REPLACE(

          REPLACE([Test Field],"a",""),

          "l",""),

          "b",""),

          "n",""),

          "f",""),

          "d","")

           

          Please see workbook v10.3 attached in the Forum Thread

           

          283174regex.png

          2 of 2 people found this helpful
          • 2. Re: Regular Expressions with Live Data Source
            Gerardo Varela

            Nice job Swaroop! I by no means know regular expression, but a quick quick search led me to this formula which seems to work in  your attachment.

             

             

            REGEXP_REPLACE([Test Field], "[^0-9]","")

             

            That should simplify your calculations just a bit.

             

            Regards,

            Gerardo

            • 3. Re: Regular Expressions with Live Data Source
              Ken Flerlage

              What's the source? Is it a database such as SQL Server or Oracle? If so, then both of those platforms have REGEXP functions built into their SQL languages. So, you could use a custom SQL to perform your REGEXP.

              • 4. Re: Regular Expressions with Live Data Source
                Rohan Thakral

                Hi Swaroop,

                 

                These are very good out of the box suggestions and are quite helpful. I will be able to extract only the digits using these suggestions after making some tweaks because my field can also contain punctuation. Since the field i'm working on is coming from a free form Text Box field where people can type in their comments and it can be anything. But your suggestion of using replace function could help out here if I use around 30-35 Replace Functions (26 for all characters and some punctuation). As you already said these are all Brute Force method but I don't see an efficient way around this.

                 

                Below is the information as requested :

                -what is your datasource type?

                     My datasource is published Live to Tableau Server using Jethro that's why I'm unable to use Regular Expressions.

                -how many rows of data are there?

                     Around 4 million of rows of data which I know will be quite inefficient when I start using above methods.

                -is your text field always the same length?

                     As mentioned it is coming from a free form text field and can vary with each row.

                -are the alphabetic characters limited to small subset?

                     No. It even contains punctuation.

                 

                Follow-up Question :

                I might be able to extract the digits from this field but I'm still concerned about extracting the results where I find a particular combination. Let's say for example, I'm looking for rows where the combination 34, 37 or 38 is found followed by 13 or more digits. Now, the problem is this can even come in the middle or start of the field. See example below, I would like to report 1st and 3rd row from below because they fulfill my criteria (3rd row contains 37 which is followed by 13 more digits and 4th row does not contain 34, 37 or 38) :

                108237102738917263819

                167237210281

                341028364758901274

                912361029843210

                 

                Any thoughts?

                 

                Thank you again for your expert insights and out of box thinking!!

                • 5. Re: Regular Expressions with Live Data Source
                  swaroop.gantela

                  Rohan,

                   

                  Thank you for answers to the questions.

                  I see more fully the dilemma.

                  Looked further, but couldn't find other methods to simplify.

                  I ran on a million rows. It took time, but wasn't too too bad.

                   

                  With regards to the search, I'm not sure that this covers all cases,

                  but first calc is the position of "34", etc:

                   

                  IF CONTAINS([Testfield2],"34") THEN FIND([Testfield2],"34")

                  ELSEIF CONTAINS([Testfield2],"37") THEN FIND([Testfield2],"37")

                  ELSEIF CONTAINS([Testfield2],"38") THEN FIND([Testfield2],"38")

                  END

                   

                  Then the test if more than 13 characters;

                  LEN([Testfield2])-[PatternLocation]>13

                  • 6. Re: Regular Expressions with Live Data Source
                    swaroop.gantela

                    Rohan,

                     

                    I've revised the check in the case that a string has both 34 and 38, for example.

                     

                    For 34:

                    IF CONTAINS([Testfield2],"34") THEN FIND([Testfield2],"34") ELSE 99 END  //99 just some large number

                     

                    same for 37 and 38.

                     

                    Then checking for length:

                    (LEN([Testfield2])-

                    MIN(MIN([PatternLocation34],[PatternLocation37]),[PatternLocation38]))

                    >=13

                    1 of 1 people found this helpful