1 Reply Latest reply on Sep 12, 2019 4:57 AM by Daniel Stanish

    How to validate E-mail Address, Website URL, Phone Number and Postal Code

    Abhilash Siddaramareddy

      Hi All,

       

      I am trying to validate the customer data. What I need is to categorize E-mail Address, Website URL, Phone Number and Postal Code into Valid and Invalid ones. I tried Tableau Prep where we can change the format of the text to email/url. Yet, I see few of the valid records are categorized as Invalid ones. The entries for above dimensions are in multiple formats as per the below example

       

      abc_@abc.com

      -@gmail.com

      abc@abc.e.abc.net

      -abc@abc.com.br

      abc_@abc.gr

       

      To identify the valid ones, I used the REGEXP formula but still the same issue.

       

      Email

       

      REGEXP_MATCH([email],"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$")

       

      Website URL

       

      IF REGEXP_MATCH([website],"^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$") = FALSE

      THEN REGEXP_MATCH([website],"^(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&%\$#\=~_\-]+))*$")

      ELSE

      REGEXP_MATCH([website],"^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$")

      END

       

      Phone

       

      IF [Country]='Australia' THEN

      REGEXP_MATCH([phone],"^(\+\d{2}[ \-]{0,1}){0,1}(((\({0,1}[ \-]{0,1})0{0,1}\){0,1}[2|3|7|8]{1}\){0,1}[ \-]*(\d{4}[ \-]{0,1}\d{4}))|(1[ \-]{0,1}(300|800|900|902)[ \-]{0,1}((\d{6})|(\d{3}[ \-]{0,1}\d{3})))|(13[ \-]{0,1}([\d \-]{5})|((\({0,1}[ \-]{0,1})0{0,1}\){0,1}4{1}[\d \-]{8,10})))$")

      ELSEIF [Country]= 'United States' OR [Country] = 'Singapore' THEN

      REGEXP_MATCH([phone],"^(\+?\d{1,2}[ -]?)?(\(\+?\d{2,3}\)|\+?\d{2,3})?[ -]?\d{3,4}[ -]?\d{3,4}$")

      END

       

      Is there a way in Tableau Prep to validate the above dimensions to the accurate level . Please let me know if anyone has attempted this and solved it using the Tableau Prep

       

      Regards,

      Abhilash

       

      Message was edited by: Abhilash Siddaramareddy

        • 1. Re: How to validate E-mail Address, Website URL, Phone Number and Postal Code
          Daniel Stanish

          Good day Abhilash.

           

          I really like regular expressions anywhere, including in Tableau. One thing I'd like to mention is emailregex.com's RFC 5322 regex for emails which is different than yours:

           

          (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

           

          The Regular Expression Cookbook, 2nd Edition (Steven Levithan, Jan Goyvaerts) gives the following for matching URLs per RFC 3986:

          \A

          (# Scheme

          [a-z][a-z0-9+\-.]*:

          (# Authority & path

            //

            ([a-z0-9\-._~%!$&'()*+,;=]+@)?              # User

            ([a-z0-9\-._~%]+                            # Named host

            |\[[a-f0-9:.]+\]                            # IPv6 host

            |\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])  # IPvFuture host

            (:[0-9]+)?                                  # Port

            (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?          # Path

          |# Path without authority

            (/?[a-z0-9\-._~%!$&'()*+,;=:@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)?

          )

          |# Relative URL (no scheme or authority)

          (# Relative path

            [a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?

          |# Absolute path

            (/[a-z0-9\-._~%!$&'()*+,;=:@]+)+/?

          )

          )

          # Query

          (\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?

          # Fragment

          (\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?

          \Z

           

          One thing I'll mention, regular expressions will only validate that the web address, email address, or phone number meets the required syntax for a web address, email address or phone number. It doesn't mean necessarily that the email address exists and will be read by someone, that there is a web server listening at that web address and you can reach it, or the phone number is still connected. It only identifies those address not conforming to the established standards, which will definitely not be valid. Despite this, using regular expression filtering does reduce errors and identifies problem records.

           

          Does this help? Best regards,

           

          -Dan

          1 of 1 people found this helpful