Hi Tableau Community,
I've been working on this analysis where I want to find relationships based on WHEN certain products were purchased by customers. My goals is to find what percent of the time was product A purchased before product B by customers, or what percent of the time was product B purchased before product A, or what percent of the time were they purchased on the same date by customers? In my data set, I have roughly 21,000 rows with about 50 different products a customer could buy and they can buy a single product, all 50 or somewhere in between. They can also buy the products on various dates. Table below is a dummy version of what it looks like.
|Account ID||Product||Order Date|
The problem I'm having is structuring the data (and not quite sure the best way to tackle this analysis). I started by making sets for the customers who bought both products I wanted to analyze (btw, interested in about 25 different product combos).
So if I wanted to look at the timing relationship of product AA and product BB, I made sets of the customers who bought both products (again, not sure if that's the best way). Then I'd try and find the order date differences by Account IDs (so I tried window functions and also simple IF statements) but couldn't get it to work the way I wanted.
Using the above table: if I was looking at customers (Account IDs) who bought product AA AND product BB, then my output would be something like: 5 accounts bought both; 40% bought A first, 20% bought B first, and 40% bought them at the same time. As far as a Viz, I'm thinking a bar graph that shows the total # of combos (5) and that's color-coded by those 3 buckets ("A first", "B first", "same time"), but am definitely open for other ideas/suggestions.
The second thing I'd be interested in is of the customers who bought product AA first, how many months it took on average for those customers to end buying product BB and vice versa.
In my example workbook, I'll use the Superstore Sample Data. Before I attached the data, I cleaned it a bit to remove any dupes where a customer might have ordered the sub-category on multiple occasions. e.g. Aaron Hawkins buying Paper only appears once. This reduced the # of rows from 8400 to about 5400. I did this since my data structured similarly and it might just add an extra layer of complexity which I wanted to avoid.
Superstore example in screenshot of a set of customers who bought BOTH sub-categories of Paper & Copiers/Fax, similar to my data set and problem. In excel, an IF statement saying if the copiers and fax column is > Paper column, 'copiers first', (iif(copiers and fax column is < Paper column, 'Paper first', 'Same time')). But I'm having a hard time doing that in this example. Also, not sure if it's possible, but since I'm ultimately going to be looking at ~25 combos, a dynamic formula would be ideal.
Anyways, the fields in this example I want to analyze are:
Workbook is Attached
Thank you for any help!!
Superstore Order Dates.twbx 1.1 MB