8 Replies Latest reply on Feb 9, 2019 5:39 AM by Nick Parsons

    RegExp_Extract

    Jesse Martinez

      Hello,

       

      I am using RegExp_Extract([Text],'[Hh]oles? ?\#?') in order to try and extract several matches from the [Text] field and put them into a new field called "[Location]". The problem is that I am only getting a single results back. When I double check it in a regex tester all matches are highlighted which tells me this is some kind of functionality limitation within Tableau. Is there a way to concat all matches with a delimiter into the [Location] field so that I can split them out? There are between 0 and 6 or so matches in each row for [Text].

        • 1. Re: RegExp_Extract
          Zach Leber

          How about using REGEXP_EXTRACT_NTH and concatenating the results together?

          Additional Functions - Tableau

          • 2. Re: RegExp_Extract
            Jesse Martinez

            Will RegExp_Extract_Nth allow me to pull all of the matches from the one text field at once? I finally got a response from Tableau and they are saying its not possible. I tried using Nth and I am not sure if I am setting it up right or now but I couldn’t get it to work. Would you happen to have an example using Nth?

             

             

            Example:

            Row                                    Text                                                                                Calc

            • 3. Re: RegExp_Extract
              Zach Leber

              I've been working on an example but for now the only one I can get to work is the example listed in the article I quoted above. One difference between Tableau and the REGEXP checkers is that Tableau requires the use of capture groups to get EXTRACT to work. The NTH function says it works up to 9 capture groups.

              • 4. Re: RegExp_Extract
                Jesse Martinez

                Ok, I will check into it.  Thanks again!

                 

                Jesse

                • 5. Re: RegExp_Extract
                  Jesse Martinez

                  @Zach Leber

                  Sorry the text I put in my email didn't transfer correctly to the post. This is what I am trying to do.

                   

                  Row                                             Text                                                                                                         Calc Result________

                  1                  mfkdskfsd Match1fkdsfkmds Match2 fdkfskdfksdMatch3                                                 Match1,Match2,Match3

                  • 6. Re: RegExp_Extract
                    Zach Leber

                    You need to create a REGEXP that matches your entire string, putting the smaller capture groups in (), then grab each capture group with NTH, and concatenate it all back together. You'll need to add some IFNULL functions given the variable number of matches you'll find. It may be slow, but the following works, and can be extended and generalized. My pattern looks for capitalized words ending in a digit separated by any number of characters.

                     

                    REGEXP_EXTRACT_NTH("mfkdskfsd Match1fkdsfkmds Match2 fdkfskdfksdMatch3",

                    '([A-Z][a-z]+\d).*?([A-Z][a-z]+\d).*?([A-Z][a-z]+\d)',1)

                    +","+

                    REGEXP_EXTRACT_NTH("mfkdskfsd Match1fkdsfkmds Match2 fdkfskdfksdMatch3",

                    '([A-Z][a-z]+\d).*?([A-Z][a-z]+\d).*?([A-Z][a-z]+\d)',2)

                    +","+

                    REGEXP_EXTRACT_NTH("mfkdskfsd Match1fkdsfkmds Match2 fdkfskdfksdMatch3",

                    '([A-Z][a-z]+\d).*?([A-Z][a-z]+\d).*?([A-Z][a-z]+\d)',3)

                    • 7. Re: RegExp_Extract
                      Jesse Martinez

                      Thanks Zach! I was able to get it to work for the most part. Some of the string fields have some sort of line break or new line type that it does not catch. It is however catching the duplicates in the main string. I tried some other characters and whatnot to try to get it. With as long as its getting, I might just need to try doing it in SQL where my data is prior to bringing it into Tableau or possibly try to do it in python on an extract of the data.

                       

                      Thanks for the help I appreciate it, even the Tableau support couldn’t give me an answer anywhere near as close as you were able to!

                       

                      Jesse

                      • 8. Re: RegExp_Extract
                        Nick Parsons

                        You could use a RegexReplace(). Figure out the negative regex to match and replace everything that IS NOT your "Holes" regex, then replace with some delimiter. This regex below doesn't work, just for example purposes but it would look something like...

                         

                        RegExp_Replace([Text],'?!([Hh]oles? ?\#?)', ', ')