3 Replies Latest reply on Sep 27, 2016 8:18 AM by David Li

    Best Way to Handle Duplicate Records after Join

    Sean Norse

      i all,

       

      This may be more of a data source problem, but I wanted to pick some Tableau expert's brains first. For this example, I am going to use the "HR Table" paradigm to represent the problem.  

       

      I have two tables:

       

      Employee_Table

      IDNameDepartment
      ID-1Margaret DoeInformatics
      ID-2John DodoEngineering
      ID-3Belle DisneyInformatics
      ID-3Belle DisneyDatabase Systems
      ID-4Rachel JohnsonEngineering
      ID-5Steve FoxInformatics
      ID-5Steve FoxEngineering
      ID-6Carlos SmithDatabase Systems

       

      Productivity_Metrics

      IDProjects CompletedRevenue Generated
      ID-134322
      ID-2116433
      ID-36773
      ID-4156997
      ID-51108
      ID-6124322

       

       

      Since some of these employees belong to two departments, they have duplicate record.  Therefore, when I join the two tables, the numerical values in "Productivity_Metrics" table is skewed as values are doubled for those who are apart of two departments.  This seems to be a common issue and I can only think of one solution:

       

      • Limit the data source so that each person only is associated with one department. 

       

      The issue with this solution is I was hoping to be able to color code/filter based on department, however, I don't think there is a perfection solution here.

       

       

      Thank you (and I know there is likely no good answer to this question but any ideas or insight in general would be appreciated for leaning purposes)Sean

       

       

        • 1. Re: Best Way to Handle Duplicate Records after Join
          David Li

          Hi Sean! One option (in Tableau 9+) is to take advantage of the implicit dimensionality in a LOD calculation to force a measure to use only the minimum value for each employee ID.

           

          That is, you can use a couple LOD calc formulas, like this:

          { FIXED [ID] : MIN([Projects Completed]) }

           

          { FIXED [ID] : MIN([Revenue Generated]) }

          If you put these measures in and sum them, they should sum up the minimum per ID, because Tableau knows that these LOD calcs have already processed at the level of ID.

           

          To give you an easier example of what's happening, consider if you have a data source of 10 rows and you do a LOD calc summing up all the revenue.

          { SUM([Revenue Generated]) }

          This value actually gets stored in each record in your data source, so it appears 10 times. Tableau is smart enough to know not to add these 10 values together, because that would give you a grand total 10 times too large!

          • 2. Re: Best Way to Handle Duplicate Records after Join
            Sean Norse

            Sorry for the delay in responding David.  This was extremely helpful thank you.

            • 3. Re: Best Way to Handle Duplicate Records after Join
              David Li

              You're welcome! If that worked for you, could you please mark it as correct so this doesn't show up as unanswered anymore? Thanks!