8 Replies Latest reply on Jul 26, 2016 10:56 AM by swaroop.gantela

# Using a calculated field to consolidate bad data

Hello,

I have slightly bad data where associated with a case number there are multiple names (i.e. Customer #: 12345 = Joe Smith; JOE SMITH; joe smith). This is a simplified example, but I have thousands of numbers with several names associated to each and I've been struggling to find a formula that will consolidate these names. Are there any Tableau functions that will allow me to select the first name that appears?

Thanks for the help!

• ###### 1. Re: Using a calculated field to consolidate bad data

Andrew,

If the issue is mainly the case of the names, you can try consolidating them

by making another field that has all the characters in lower case:

LOWER([Customer])

If you wanted to assign a preferred form of the name to an ID

you can try these steps:

Split the Name in parts by right-clicking on the [Customer] dimension

and selecting Transform > Custom Split

space as the separator and "Split off" All.

Then check if just the first letter of each are capitalized:

IF LEFT([CustomerFirstName],1)=UPPER(LEFT([CustomerFirstName],1))

AND LEFT([CustomerLastName],1)=UPPER(LEFT([CustomerLastName],1))

AND [Customer]<>UPPER([Customer])

THEN [Customer]

END

Then use a Level of Detail calculation to fix that for all names under that ID:

{ FIXED [ID]:MAX([PreferredName])}

2 of 2 people found this helpful
• ###### 2. Re: Using a calculated field to consolidate bad data

Thank you Swaroop! This helped with a majority of my fields, but there are still several that have an issue where the two names are different (Andrew vs. Drew).

Do you know of any ways to select the first name that appears?

• ###### 3. Re: Using a calculated field to consolidate bad data

Andrew,

Please see if the attached is closer.

I used some nested Level of Detail calculations:

If by first name you mean the name associated with the earliest date,

you can try:

{ FIXED [ID]:MIN(

IF [Date]={ FIXED [ID]:MIN([Date])}

THEN [Customer]

END)}

Which finds the minimum date per ID, finds the name associated with that,

then fixes that name for all entries of the same ID.

If by first name you mean the name associated with the first row with that ID,

you can try:

{ FIXED [ID]:MIN(

IF [Row]={ FIXED [ID]:MIN([Row])}

THEN [Customer]

END)}

The Level of Detail expressions do not allow Table Calculations within,

and so I was unable to use INDEX or RANK.

2 of 2 people found this helpful
• ###### 4. Re: Using a calculated field to consolidate bad data

So I believe this would work, but the double level of detail expression crashed our database (used up too much CPU). Any idea how to do this with less work on the server side?

Thanks again for all your help!

• ###### 5. Re: Using a calculated field to consolidate bad data

Andrew,

Hmm. Maybe splitting it up into two parts would help?

[Part1]:

IF [Date]={ FIXED [ID]:MIN([Date])}

THEN [Customer]

END

[Part2]:

{ FIXED [ID]:MIN([Part1])}

Not sure if any optimization can be done regarding this, but just wanted to clarify,

what is your criterion for calling an entry the first one? Is it by date or row number?

• ###### 6. Re: Using a calculated field to consolidate bad data

Andrew,

Please see if the attached will help.

The Customer name needs to be sorted by Date.

Then can get the first name using:

IF FIRST()=0

THEN ATTR([Customer])

END

Then to apply this name to all of that ID:

WINDOW_MAX([FirstName])

• ###### 7. Re: Using a calculated field to consolidate bad data

Hi Swaroop,

You are correct in assuming our data is organized by the date. That said, I am having trouble understanding what your last post is saying to do. I created a calculated field using ATTR([Customer Name]) - (the bad data), but when using the WINDOW_MAX, I get a decent amount of NULLs showing up where double names used to be. Currently the table calculation is going across the table, but I think I want it to go down the table.

Again, thank you so much for your help!

• ###### 8. Re: Using a calculated field to consolidate bad data

Andrew,

My apologies, I didn't describe the Partitioning and Addressing.

This is to be done for all three table calculations in the previously posted 210762name3.twbx:

[Index], [FirstName], and [WindowName]:

-Right-click on the pill in question

-Select "Edit Table Calculation"

-In the "Table Calculation" window that pops up, go to the "Compute using" pulldown and select "Advanced"

-In the "Advanced" window that pops up, move the fields to the Addressing column as shown below

-Back in the "Table Calculation" window, set "At the level" to be "Deepest"

and set "Restarting every" to be [ID].

If you temporarily put [Index] in your view, you will see how it is numbering the entries.

Once the [CustomerName] is sorted by date, you'll see the Index of 1 for the earliest name,

and then you'll see it restart at 1 for the next [ID].