About this article
In this article, I will walk through a technique I have used to select only the most recent record from a bunch of records using Tableau Prep. For this specific example, I’ll use our Superstore sample data. The challenge is that I want to create a flow that keeps only the most recent order for each customer. Or, a little more specifically, all that customer’s orders for their most recent order date, because some folks place multiple orders daily. This would be easy to do in Desktop, using a filter based on a Level of Detail calculation, but in my case, I want the records omitted from my Hyper extract entirely, so I’ll use Prep to do so.
Case in point: customer AA-10480 has orders in 2015 and 2018; I want to keep just the single row for the order on 4/15/18, Order ID CA-2018-114412.
This use case crossed my desk when working with some Higher Education data. I had a Student dimension table, but there were a bunch of "dirty" duplicate student records therein, with slightly different attributes for the student. I needed to join my student dimension to a fact table without any duplicates, so I didn't risk double-counting. And the only thing on which I could logically establish which record to keep was an UPDATE_DATE column. So I scratched my head a bit and came up with this technique to use Prep to clean things up. In my case, this operation was part of a bigger flow that later joined to the fact table, but I believe in some cases this might constitute its own standalone flow.
1. I’m using Tableau Prep and Tableau Desktop, both version 2018.1, on Windows 10.
2. In Prep, open a new flow and “Connect to Data”.
3. Click “Microsoft Excel,” and browse to select “C:\Users\[name]\Documents\My Tableau Repository\Datasources\2018.1\en_US-EU\Sample – Superstore.xls”
4. Drag the “Orders” worksheet onto the canvas.
5. Click the “+” sign on the right edge of the “Orders” node, and “Add Aggregate”
6. In the “Aggregate 1” node, drag “Customer ID” to “Grouped Fields”, and “Order Date” to “Aggregated Fields”. Change “Order Date” from “Count” to “Maximum”.
7. Now, we’ll get crafty: on the canvas, drag the “Orders” node on top of the “Aggregate 1” node, dropping it onto the “New Join” drop zone, as shown:
8. You should now see this:
9. Now we need to edit the “Join 1” node. Prep defaults a join on “Customer ID = Customer ID”, and that’s great, but we also need to add a join on “Order Date”, so click this “+” sign:
10. Then add a second join clause where “Order Date = Order Date”:
11. Click the “+” sign on the right edge of the “Join 1” node, and “Add Output”.
12. On the output node, click “Browse” and select the location and filename of your output. Keep the Hyper output type.
13. On the output node, click “Run Flow”.
14. Let’s check our work. Navigate to your new Hyper file, and double-click it. It should open in Tableau Desktop.
15. In Tableau Desktop, drag “Customer ID” from the Dimensions area to the Filter shelf. Select only “AA-10480”.
16. Drag “Order Date” from the Dimensions area to the Columns shelf. Click “+” three times to drill down to Quarter, Month, and Day.
17. Note there’s only one Order Date: 4/15/18! You did it!