6 Replies Latest reply on Feb 23, 2014 8:51 AM by Bruce Segal

# How to extract top level domain from email list

Hi.

I am trying to figure out how to extract the top level domain (.com, .gov, etc...) from a list of subscriber email addresses (excel & oracle datasource) using calculated field.

I found an old community post where someone posted this forumla that works, but contains all the domain information after the '@' sign:

RIGHT([EMAIL],LEN([EMAIL])-FIND([EMAIL],"@"))

The example above gets me from name@gmail.com to gmail.com, but what I really after is .com. Also, something that should be considered is that lower/upper case characters should be ignore and counted as the same thing.

Thanks

• ###### 1. Re: How to extract top level domain from email list

Hey Vang,

Try this as a calculated metric. It will locate domains that are also 2 characters long. Modifying the calculation you provided won't work nicely with "." as an email can be John.Smith@here.com.

---o

// Look at last 3 characters of an email for domain. Return only the domain.

IF

FIND(RIGHT(TRIM([EMAIL]),3),".")=1 THEN RIGHT(TRIM([EMAIL]),2)

ELSE RIGHT(TRIM([EMAIL]),3)

END

---o

Basically says to provide the last 3 characters of the email; in this case the most popular domain lengths. Then check to see if a "." exists telling me it is a 2 character domain, in which case give me the last 2 characters. Else give me the last 3 characters.

TRIM is used to make sure there are no leading or trailing spaces prior to making the calculation. Spaces are considered a character and could cause problems.

EDIT:

If you want to keep the "." along with the domain use this

---o

IF

FIND(RIGHT(TRIM([email]),3),".")=1 THEN RIGHT(TRIM([email]),3)

ELSE RIGHT(TRIM([email]),4)

END

---o

• ###### 2. Re: How to extract top level domain from email list

TJ:

Thanks this was very helpful. I added a little tweak to account for data sets where domains have mixed capitalization; e.g. gmail.com, GMAIL.com, etc. If you don't convert the output to Lower case then Tableau will see .com, .COM and .Com as unique strings.

See highlight bold text below.

IF

Find (Right (Trim ([Email]),3),".")=1 THEN Right (Trim (Lower ([Email])  ,3)

ELSE Right (Trim (Lower ([Email])  )  ,4)

END

Note: You only need to add the Lower function to the right site of the THEN, and on the ELSE part of the statement. The reason is that those are the output sides of the calc and you only need to put the output into lower case.

Now the challenge is to make this work for 4 character TLD's like .info b/c instead of 2 choices (2 or 3-char domains) now there are 3 choices 4 char, 3 char and 2-char domains.

I'm thinking we'll adjust the calc to return the last 4 chars and look for a "."  In this version that means the domain is a 3-char domain, in which case give me the last 3 chars. Then run as a subcalc the calc you have to id 2-char domains. Anything else is a 4-char domain.

If I dig into it I'll post it here.

1 of 1 people found this helpful
• ###### 3. Re: How to extract top level domain from email list

Good call on the case sensitivity and the 4 character TLD's. Your solution is definitely a bit more robust in addressing those issues.

Thanks for the comment!

TJ

• ###### 4. Re: How to extract top level domain from email list

TJ thanks. And thanks to you for your solution. I had tried a number of calcs that tried to fine the last "."  using the mid() function, but it'd get hung up on email addresses like first.last@x.com

• ###### 5. Re: How to extract top level domain from email list

Vang Xiong

Well, I think you guys might have found the solution by now.

What you can do is, you already have a calculated field that gives you something like

gmail.com

If we create another calculated field on this field it is possible to get the desired result. Something like this in the attached workbook!

It should work for all scenarios!

• ###### 6. Re: How to extract top level domain from email list

Aman: Yep this covers them all and is an elegant solution.

Vang: If you get these updates, consider marking Aman's solution as answering the question.