9 Replies Latest reply on Mar 8, 2017 10:34 PM by lei.chen.0

    De-dup data source with a filter

    Nathan Brower

      I connect to an external data source, over which I have no control, that has some duplicate observations.  Unfortunately there is no unique identifier.

       

      I calculated a variable that is a concatenation of all other variables, "Concatenate", and this is what I want to use to dedup.  For the observations that are duplicates, I would like to keep *one* record and drop the rest.  All of the answers I have seen posted either remove all instances of the duplicates or rely on selecting a MIN/MAX Unique_ID within a group of duplicates.  Neither of those apply here.

       

      Ideally, I could filter my data with a Condition like this:

       

      COUNT([Concatenate])=1 OR (COUNT([Concatenate])>1 AND First Observation in the Group]

       

      However, I do not know the logic for that bolded portion.

       

      Thank you for anyadvice.

        • 1. Re: De-dup data source with a filter
          lei.chen.0

          Hello Nathan,

           

          Based on my understanding, I simply your external data source as,

           

          dim 1, dim 2, dim 3, ..., dim n, value

           

          Then,

           

          concatenate = dim 1 + dim 2 + ... + dim n

           

          And you want to get the first (first occurrence) value for each concatenate.

           

           

          No, Tableau doesn't know the order of occurrence of data line without a unique key.

           

           

          Does your "First Observation" of the value "in the Group" have any logic?

           

          For example, if the value keeps increasing, and you may use MIN(),

          or if the value keeps declining, and you may use MAX().

           

           

          Regards

           

          Lei

          • 2. Re: De-dup data source with a filter
            Gourav Sharma

            Hi Nathan,

             

            Could you please share some sample data to check.

             

            Gourav

            • 3. Re: De-dup data source with a filter
              Nathan Brower

              Hi Lei,

               

              Thanks for your response.  Yes, your understanding of the dataset is correct.

               

              To answer your question, there is no logic to needing the first observation.  I just need one copy, it could be any one of the observations in a duplicate group, because they are all the same.  That is to say, MIN(concatenate)=MAX(concatenate)=First observation=2nd observation=nth observation in a duplicate group.

               

              You can think about the data like this.  The first 3 rows here are the duplicates and when I bring the data in to Tableau I only want one copy from that group.  They are all the same, so it does not matter which.

                

              Account_MgrCustomerDateSaleConcatenate
              NBABC1/1/2017$100 NBABC42736100
              NBABC1/1/2017$100 NBABC42736100
              NBABC1/1/2017$100 NBABC42736100
              JSXYZ1/1/2017$25 JSXYZ4273625
              JAJFK1/2/2017$50 JAJFK4273750

               

              Does this clarify?  I appreciate the advice.

              • 4. Re: De-dup data source with a filter
                Nathan Brower

                Hi Gourav,

                 

                Thank you for the response.  I cannot upload the actual workbook because the data is sensitive but the below table captures the idea.

                 

                 

                Account_MgrCustomerDateSaleConcatenate
                NBABC1/1/2017$100 NBABC42736100
                NBABC1/1/2017$100 NBABC42736100
                NBABC1/1/2017$100 NBABC42736100
                JSXYZ1/1/2017$25 JSXYZ4273625
                JAJFK1/2/2017$50 JAJFK4273750

                 

                The first 3 rows here are duplicates.  I want to keep one observation and filter out the rest.  Any ideas?

                 

                Thank you.

                • 5. Re: De-dup data source with a filter
                  Gourav Sharma

                  Hi Nathan,

                   

                  Hope you're looking for this.

                   

                  Gourav

                  • 6. Re: De-dup data source with a filter
                    Nathan Brower

                    Thanks Gourav.   That is the end result I am looking for.  How did you do it?  I cannot tell from the workbook.

                    • 7. Re: De-dup data source with a filter
                      Gourav Sharma

                      Hi Nathan,

                       

                      Check the count of the concatenate filed.

                       

                      check:

                      count([Concatenate])

                      Then the checked sum would be:

                      sum(sale)/[check]

                       

                      Also, if this is the correct solution, please mark the answer to be correct and close the thread.

                       

                      Thanks,

                      Gourav

                      • 8. Re: De-dup data source with a filter
                        Nathan Brower

                        Hi Gourav,

                         

                        Thank you.  As I understand it, this works well for a given worksheet/crosstab.  However, I need the actual data extract filtered so that all of the worksheets I make automatically have these extra rows excluded.  While the illustrative data I posted has five variables, the actual data has dozens and my workbook has many tabs, so making custom filters for for each worksheet and each measure isn't practical.  Any ideas on that?

                         

                        Thank you again for the advice.

                        • 9. Re: De-dup data source with a filter
                          lei.chen.0

                          Hello NATHAN,

                           

                          Nathan Brower wrote:

                           

                          However, I need the actual data extract filtered so that all of the worksheets I make automatically have these extra rows excluded.

                           

                          You mentioned data extract, great point!

                           

                          Why not try aggreation on day?

                          It's impossible to change aggregation type, it's fixed to sum, but...

                           

                          But the data extracted is as follows, and you may notice that the "Number of Records" is useful.

                           

                          Finally I created a calculated field to adjust the sales.

                           

                           

                           

                          Regards

                           

                          Lei