2 Replies Latest reply on Sep 4, 2017 11:59 PM by Chris McClellan

    Removing Duplicate Records

    Elie Rahi

      See below an example data. I need to do a couple of steps here. The first step is I need to count the number of names, equivalent to a COUNTIFS in Excel. Then I need to remove the duplicates based on Names. What sort of formulas would I need to call here?


      My data source is Salesforce so I can be tampering with an Excel spreadsheet.



        • 1. Re: Removing Duplicate Records
          Alexander Roussos

          Hi Elie,



          I don't quite understand what you are trying to accomplish. Are you as a first step trying to count ALL names no matter if they are a duplicate? In that case I would to a count calculated field otherwise I would do a count distict calculated field. And for the second questions I would do a LOD expression.


          1) To count all the names no matter how many times they appear: COUNT([Name])
          1) To count distinct names: COUNTD([Name])



          2) Use a FIXED expression to remove the duplicate data.


          It is necessary to know which dimension in the data source is unique for each row value, or the combination of dimensions. For example, if the measures on Table A have a unique row identifier based on Date/Time, use that dimension to remove duplicate values.


          Once the necessary dimension or dimensions are found, use an LOD calculation to remove the duplicate values.


          For example:

          {Fixed [Unique Dimension]: MIN([Measure])}


          If the table has a combination of multiple dimensions, use a calculation similar to the following:

          {Fixed [Dimension 1],[Dimension 2], [Dimension 3]: MIN([Measure])}


          As always it is easier if you include a workbook!



          1 of 1 people found this helpful
          • 2. Re: Removing Duplicate Records
            Chris McClellan

            Depending on what you want to do, I wouldn't even use an LOD.


            If you create a sheet and add Name and Count (as per your data), then change it from SUM([Count]) to MIN([Count]) you will get the data the way you want - with duplicates removed.

            1 of 2 people found this helpful