    Tableau Prep - Remove Duplicates

    Stephen Groff

      I am not a seasoned Tableau user, but I am consistently confounded by the fact that Tableau offers no easy solution to rid your data of duplicates.  I'm not going to pretend that I understand coding or script language, but it seems like such an easy feature to add-in... I just don't get it.


      I know Tableau Prep is new, so many of you are not familiar with it at all, but this solution appears to have the ability to be cross-functional in both Tableau Prep and Tableau Desktop.... as I see no other way of removing duplicates other than through a calculation.


      Trouble is, I don't know how to write that calculation.  Does anyone have any idea how to easily remove duplicates from your data?


      Such as telling Tableau to look at this column... look for duplicates, if found... delete the row.


      Please help!

        • 1. Re: Tableau Prep - Remove Duplicates
          Branden Kornell

          This is usually a situation that is handled on the data source side, rather than in Tableau itself.


          What's your data source from? (Excel, database, etc.)


          Can you identify duplicate rows using a single field, like duplicate IDs? Or do you need to verify that all fields in a row are the same?

          • 2. Re: Tableau Prep - Remove Duplicates
            Stephen Groff

            Hi Branden,


            I'm connecting to the data through Oracle.  I can't do anything on the database side as I don't have access to change anything... so everything I do has to be done in a 3rd party app such as Excel, or in this case Tableau.


            I'm working with Serial numbers.  They are my unique identifier.  So I'm trying to come up with a way to input a Calculated field in Tableau Prep to remove the repeaters. 


            It's also worth noting, there are several calcs unavailable in Tableau Prep (i.e. LOOKUP, etc)... although FIND is available... Hope this information helps.

            • 3. Re: Tableau Prep - Remove Duplicates
              Branden Kornell

              One common way to remove duplicates in a query is to change the SQL statement from SELECT to SELECT DISTINCT. This doesn't require changes to the database; it's just changing the query being sent to it.

              • 4. Re: Tableau Prep - Remove Duplicates
                Santiago Sanchez

                Hi Stephen,


                As Branden described, you could take care of this on the database side but you can do that in Tableau too. 2 common alternatives to this:


                1. Using an LOD expression, a calculation, as described here: Removing Duplicate Data with LOD Calculations | Tableau Software

                2. Use Tableau Prep's aggregate function, as shown here: Clean and Shape Data in Tableau Prep


                Either approach would require you to identify what makes a row unique, but sounds like you already know that! I've seen that often times there are other columns besides the unique identifier that create the duplicates, for instance, for each serial number there may be multiple communications associated with it which create multiple rows. You can aggregate those communications to a single row using either of the approaches above.


                Hope that helps!


                • 5. Re: Tableau Prep - Remove Duplicates
                  Stephen Groff

                  Thanks for the information gentlemen.  Unfortunately, you cannot use LODs in Tableau Prep.  I have investigated the idea of removing duplicates using the Aggregate function in Tableau Prep, and this seems to work to a degree...  However, some additional investigation will be needed on my end.  I can get the results I am looking for, but not without sacrificing the overall look of the data.  I'm an not a Tableau expert, and by no means a Tableau Prep expert, so whether going the 'Aggregate' function route is going to solve my problem is to be determined.


                  I appreciate your time.  Thanks again!



                  • 6. Re: Tableau Prep - Remove Duplicates

                    The correct way to do it in Prep is to Aggregate all the columns. However as you say "I can get the results I am looking for, but not without sacrificing the overall look of the data." is kind of an oxymoron, if there are duplicates and you want to remove them, obviously that will change the number of records once corrected.

                    • 7. Re: Tableau Prep - Remove Duplicates
                      Stephen Groff

                      I should've been more specific...


                      If I aggregate a single column to a distinct count, I get the results I need.  Which is wonderful.  However, when I add the date, for example, in the 'GROUP' section so I can still attach a date to my results... the distinct count that was once giving me an accurate #, no longer gives me that accuracy.  Like I said, I new to the Tableau world and the world of... well anything outside Excel and more training is needed... so give me a break.  Did you reply to give me a hard time, or did you have something useful for me to try?

                      • 8. Re: Tableau Prep - Remove Duplicates
                        Santiago Sanchez

                        Hey Stephen,


                        Yeah, LODs can't be used on Prep but an aggregation would achieve the same. If you have a sample data set that can represent the problem and a description of how you'd want to see the data displayed in Tableau (screenshots usually help) then the community is usually extremely fast to tackle the problem and make suggestions. If that's a route you'd like to explore, I'd suggest opening a new question on the forum with those extra details. This article can help too: Packaged workbooks: when, why, how


                        Welcome to the Tableau Community!



                        • 9. Re: Tableau Prep - Remove Duplicates

                          I am sorry if I came off as condescending, that was not my intention. I simply do not understand your request. Like Santiago said, if you can share a sample and your desired outcome most community members will help build you a solution, myself included.

                          • 10. Re: Tableau Prep - Remove Duplicates
                            Eric Viglotti

                            Hello all,


                            I'm actually very interested in this as well. I'm quite familiar with larger scale ETL tools than Prep as well as being familiar with Excel. Using Excel as an example, you select all of your data, go to the "Remove Duplicates" command, tell it which column(s) uniquely identifies the record and it tells you how many duplicates there and you click the button and away they go. What is happening, as I understand it, is it is looking at each instance of the unique identifier, calling that the "master", keeping it and then any subsequent records with this unique identifier is marked as "duplicate" and removed. That's exactly how the other ETL tool works and, like Excel, you sort the data coming in before this step so you can prioritize which one to "pick" and thus the remainder are thrown away.


                            I have used this a tons of times and unfortunately I don't think any of the solutions here would quite work for that.


                            Any thoughts if this is common enough that it might warrant an idea for future consideration?