5 Replies Latest reply on Apr 21, 2015 9:46 AM by Joshua Milligan

    How to identify duplicates

    Melissa Joseph

      Hello,

       

      I am fairly new to Tableau and would appreciate any help on the below.

       

      In the attached twbx file below, I am trying to identify any reports  that is a duplicate.

      Typically if meetings have been held with the same Client Rep in a client organization, by two Account reps and both have submitted their reports, only one will be considered as the correct report. The other will be considered a duplicate. The report has to identify duplicates based on the Client Concatenate column. In the case below, if for Honda MaryDay, Jane Smith and John Tam have submitted reports (Y in the Report Submit column) then one is supposed to be correct and the other is a duplicate. So probably the first one that shows up in the report will be correct and the second one will be duplicate.

       

      Secondly, if two Account reps meet with the same client and one submits the report but the other one does not, then it should not be flagged as late.

      In the case of BMW AdamThomas, Tom Thomas submitted the report and Tom Jacob did not submit the report. However, Tom Jacob should not be flagged as Late. So the condition is if for Client Concatenate, if there is a Y for Report  Submit, Late should not show  in the “report late” column.

       

      Pl remember all the fields below should be there in the view.

       

      Thank you!

        • 1. Re: How to identify duplicates
          Amir Shareghi

          Hi Mellisa,

           

          For your first question you can count the number of "Y" in each pane. If it is more than one then you have a duplication. If you have Tableau 9 your work is much easier by using "fixed" function.

           

          For your second question you can use lookup function.

          • 2. Re: How to identify duplicates
            Melissa Joseph
            I cannot use Tableau 9 as per our policy. Can you explain how to use Lookup
            • 3. Re: How to identify duplicates
              Joshua Milligan

              Hi Melissa!

               

              Welcome to the forums!

               

              Here is how I solved it using Table Calculations.  (And Tableau 9.0 LOD calcs proably would have been easier)

               

              1.png

               

              You'll notice in the screenshot above that I've created 4 calculations:

              • Number of Yes
              • Window_SUM Yes
              • Actually Late?
              • Hide 2nd Yes

               

              The first one, Number of Yes, is simply:

              SUM(IF [Report Submit] == "Y" THEN 1 ELSE 0 END)

               

              Which allows me to get the number of "Y"s

              Using Amir's suggestion, I use the WINDOW_SUM function to calculate the total number of Ys per window (meaning I want to partition using Client Concatenate -- notice the markings on the screenshot above indicating how that will be the window).  I can set that in the Advanced settings of the table calcualtion (right click the field in the view and select Edit Table Calculation)

               

              2.png

              Notice how I kept only Client Concatenate in Partitioning and moved everything else over to the right.

               

               

              Now, I can use that to count the number of "Y" and everything else falls into place.

               

              The Hide 2nd Yes function has this code:

              [Window_SUM Yes] > 1 AND FIRST() != 0

               

              Basically, it's just checking to see if there's more than one "Y" and hiding returning true for all but the first one.  You could move this field from Rows to Filters and keep only the False values to hide the duplicates.  Now, I'm making the assumption that all are "Y" -- if for example, there was 1 "N" and 2 "Y" then this might not work.

               

              Actually Late?

              IF [Window_SUM Yes] > 0 THEN "" ELSE LOOKUP(ATTR([Report Late]), 0) END

               

              So, if the sum of "Y" is at least 1, then it's not late, otherwise, we'll take the value of [Report Late].  Now, I may have mis-understood that a report could be marked late even with a "Y".  If that's true then this would need to be adjusted.

               

               

              Hope that helps a little!

               

              regards,

              Joshua

              1 of 1 people found this helpful
              • 4. Re: How to identify duplicates
                Melissa Joseph

                Thank you very much, everyone. Great help on this!

                • 5. Re: How to identify duplicates
                  Joshua Milligan

                  You're welcome!  Glad to help!

                  -Joshua