3 Replies Latest reply on Jan 13, 2012 6:33 AM by audunmathiasygard

    Filtering out "duplicate" rows

      Hi,

       

      I need some help to figure out how to filter out some duplicate data from a database.

       

      In the attached example, I'd like to filter out any rows with an ID which has *more* than one row in any given week. In the example, this means we'd like to filter out rows with the ID 1234 or 3198 in any calculations we do, since the first has several rows in all weeks, and the second has two rows in the last week.

       

      It is of course possible to remove these "duplicate" rows with a custom SQL query, but the real database is unfortunately very large, and I keep thinking there must be a simple way to do it inside Tableau (except for manual filtering, as I've done in the example)...

       

      Anyone able to help?

        • 1. Re: Filtering out "duplicate" rows

          Since this was posted right before Christmas, I'll take the liberty to ask again. Anyone able to help?

          • 2. Re: Filtering out "duplicate" rows
            Richard Leeke

            Let me just check that I understand the logic you want first.  You want to be able to get aggregates (weekly averages in the case of the example) across the database, but excluding any IDs which ever occur more than once in any week from all of the aggregates - i.e. you don't even want to include one instance of the duplicated rows and you never want to include an ID that has ever been duplicated within the aggregates for any week, even weeks where it is not duplicated.  Is that right?

             

            I think It would be possible to filter out the rows using table calculations in Tableau - though as you say your real database is very large this may not be practicable.  It certainly wouldn't count as "simple".  The only way I can see to do it with table calculations would be to bring back weekly values for each ID, including a count of values for that ID in that week.  You could then have a calculated field which checked for the WINDOW_MAX() of the count of rows for each ID and do all of your aggregate calculations locally with WINDOW_XXX() functions.  But it would get horribly messy.

             

            I think some custom SQL is likely to scale much further and make your workbook much simpler.

            • 3. Re: Filtering out "duplicate" rows

              You understood me correctly.

               

              As you say, the way to do it is to count weekly numbers of each ID. I can calculate these values in a worksheet in Tableau using a table calculation, but I can't then use these results as a filter for another worksheet, which is the problem..

               

              Looks like I might have to go with custom SQL.