8 Replies Latest reply on Jul 26, 2016 10:56 AM by swaroop.gantela

    Using a calculated field to consolidate bad data

    Andrew Boxley

      Hello,

       

      I have slightly bad data where associated with a case number there are multiple names (i.e. Customer #: 12345 = Joe Smith; JOE SMITH; joe smith). This is a simplified example, but I have thousands of numbers with several names associated to each and I've been struggling to find a formula that will consolidate these names. Are there any Tableau functions that will allow me to select the first name that appears?

       

      Thanks for the help!

        • 1. Re: Using a calculated field to consolidate bad data
          swaroop.gantela

          Andrew,

           

          If the issue is mainly the case of the names, you can try consolidating them

          by making another field that has all the characters in lower case:

          LOWER([Customer])

           

          If you wanted to assign a preferred form of the name to an ID

          you can try these steps:

           

          Split the Name in parts by right-clicking on the [Customer] dimension

          and selecting Transform > Custom Split

          space as the separator and "Split off" All.

           

          Then check if just the first letter of each are capitalized:

          IF LEFT([CustomerFirstName],1)=UPPER(LEFT([CustomerFirstName],1))

          AND LEFT([CustomerLastName],1)=UPPER(LEFT([CustomerLastName],1))

          AND [Customer]<>UPPER([Customer])

          THEN [Customer]

          END

           

          Then use a Level of Detail calculation to fix that for all names under that ID:

          { FIXED [ID]:MAX([PreferredName])}

          210762name.png

          2 of 2 people found this helpful
          • 2. Re: Using a calculated field to consolidate bad data
            Andrew Boxley

            Thank you Swaroop! This helped with a majority of my fields, but there are still several that have an issue where the two names are different (Andrew vs. Drew).

             

            Do you know of any ways to select the first name that appears?

            • 3. Re: Using a calculated field to consolidate bad data
              swaroop.gantela

              Andrew,

               

              Please see if the attached is closer.

              I used some nested Level of Detail calculations:

               

              If by first name you mean the name associated with the earliest date,

              you can try:

              { FIXED [ID]:MIN(

              IF [Date]={ FIXED [ID]:MIN([Date])}

              THEN [Customer]

              END)}

               

              Which finds the minimum date per ID, finds the name associated with that,

              then fixes that name for all entries of the same ID.

               

              If by first name you mean the name associated with the first row with that ID,

              you can try:

              { FIXED [ID]:MIN(

              IF [Row]={ FIXED [ID]:MIN([Row])}

              THEN [Customer]

              END)}

               

              The Level of Detail expressions do not allow Table Calculations within,

              and so I was unable to use INDEX or RANK.

               

              210762name2.png

              2 of 2 people found this helpful
              • 4. Re: Using a calculated field to consolidate bad data
                Andrew Boxley

                So I believe this would work, but the double level of detail expression crashed our database (used up too much CPU). Any idea how to do this with less work on the server side?

                 

                Thanks again for all your help!

                • 5. Re: Using a calculated field to consolidate bad data
                  swaroop.gantela

                  Andrew,

                   

                  Hmm. Maybe splitting it up into two parts would help?

                  [Part1]:

                  IF [Date]={ FIXED [ID]:MIN([Date])}

                  THEN [Customer]

                  END

                   

                  [Part2]:

                  { FIXED [ID]:MIN([Part1])}

                   

                  Not sure if any optimization can be done regarding this, but just wanted to clarify,

                  what is your criterion for calling an entry the first one? Is it by date or row number?

                  • 6. Re: Using a calculated field to consolidate bad data
                    swaroop.gantela

                    Andrew,

                     

                    Please see if the attached will help.

                    It uses window calculations instead.

                     

                    The Customer name needs to be sorted by Date.

                    Then can get the first name using:

                    IF FIRST()=0

                    THEN ATTR([Customer])

                    END

                     

                    Then to apply this name to all of that ID:

                    WINDOW_MAX([FirstName])

                    • 7. Re: Using a calculated field to consolidate bad data
                      Andrew Boxley

                      Hi Swaroop,

                       

                      You are correct in assuming our data is organized by the date. That said, I am having trouble understanding what your last post is saying to do. I created a calculated field using ATTR([Customer Name]) - (the bad data), but when using the WINDOW_MAX, I get a decent amount of NULLs showing up where double names used to be. Currently the table calculation is going across the table, but I think I want it to go down the table.

                       

                      Again, thank you so much for your help!

                      • 8. Re: Using a calculated field to consolidate bad data
                        swaroop.gantela

                        Andrew,

                         

                        My apologies, I didn't describe the Partitioning and Addressing.

                        This is to be done for all three table calculations in the previously posted 210762name3.twbx:

                        [Index], [FirstName], and [WindowName]:

                         

                        -Right-click on the pill in question

                        -Select "Edit Table Calculation"

                        -In the "Table Calculation" window that pops up, go to the "Compute using" pulldown and select "Advanced"

                        -In the "Advanced" window that pops up, move the fields to the Addressing column as shown below

                        -Back in the "Table Calculation" window, set "At the level" to be "Deepest"

                        and set "Restarting every" to be [ID].

                         

                        If you temporarily put [Index] in your view, you will see how it is numbering the entries.

                        Once the [CustomerName] is sorted by date, you'll see the Index of 1 for the earliest name,

                        and then you'll see it restart at 1 for the next [ID].

                         

                        210762name4.png