3 Replies Latest reply on Jun 25, 2018 4:52 AM by Jerry Flatto

    Delete semi-duplicate rows

    Jerry Flatto

      Attached is a sample csv file showing some data related to how the individual paid (i.e., tender method) for their items; how the item was paid for (cash, credit, check).  As you look at the data, you will notice that the Transaction ID is duplicated in some cases. For example, the first transaction is duplicated a total of 9 times.  In the vast majority of cases, customers use a single payment method so the duplicate rows are errors related to the sales system. I simply want to keep the first row of each unique Transaction ID and ignore the rest of them.  In some cases, the method of payment (cash, credit, check) stored in the Description field will vary even for the same Transaction ID.  That doesn't matter, I simply want the first row. For example, looking at the second screen shot below, I want to keep the first row only for 16241981 that has "cash" associated and ignore / delete the rest of the rows.

       

      I am trying to join two files together.  This file which contains the method of payment joined with the rest of the sales information in a second file.  The join is giving me extraneous rows because of the duplicate Tender rows.

       

      I would like to fix this in Tableau Prep before I perform the join in the first place.  Is there a method to eliminate the duplicated rows?   A thought I had was a pivot of columns to rows of just the Description Field and then just keep the first new column created but it does not appear that I can pivot on a single column only. I am also not sure that Tableau Prep supports "rows to columns" but just "columns to rows" pivot.