I put together a small data set to give an example.
Based on this Knowledgbase Article I have created the attached Workbook.
In Summary we need to
1. Select an item
2. Retrieve all transactions that the item is on
3. Show all other items that are on the returned transactions.
The article shows how to create a copy of the table and join (in this case I used the transaction id) and exclude all products (item number) that are the same.
I used two methods one where you select the item in a dashboard graph and use that graph to filter a second graph showing the related (cross sell) items.
The second was using a quick filter to select a parameter to only show that item (the advantage is that this is then held in one worksheet).
Hopefully the KB article and attached workbook will help you to get to where you want.
All the best
Attached is one way to go about this, using the Superstore Sales data. The workbook combines a few different techniques:
- The Product Selector worksheet has Order ID on the Level of Detail, while the Product Name is on the Rows Shelf, sorted by the sum of sales. The # of Orders and Sum of Sales are both table calculations in order to deal with having Order ID on the Level of Detail. The advantage to this is that the dashboard can use the Order ID's of the selected Product Name to for an Action Filter to the Associated Products worksheet. A Highlight Filter is also used to highlight the location of the chosen Product Name in the list.
Note that usually when making a list like this, I'll add a filter to get rid of the Null values (there 1043 of them, you can see that in the information bar where it shows 2102 marks in 959 rows by 1 column), however I'm not in this case because if those are filtered out then their Order ID's are also removed from the Action Filter.
- The Associated Products worksheet also has Order ID on the Level of Detail, and also uses table calculations to generate the # of units (Order Quantity in the Superstore Sales data) and the # of Orders. The latter is useful to identify one-offs. This list has the Product Name sorted by the sum of Order Quantity.
@Ian - Nice work! I'd started working on my post earlier today and just got back to it, I should have checked first.
@Neal - Here's a summary of the three different methods used:
- Custom SQL: This is Ian's first example, it has the advantage of letting you do the analysis within simple aggregates (SUM, etc.), the disadvantages are duplicating data and (if you don't know it) mucking about in SQL.
- Parameter: This is the second example, this gives you an easy to use drop-down. This method can also be used without needing to reshape the data, I documented it here along with some other solutions: http://community.tableau.com/message/186524#186524 The disadvantage here is that any time the list of items changes, the parameter will need to be updated.
- Worksheets and table calcs: This is the option I posted, this method doesn't require reshaping the data, but does require some familiarity with table calculations.
@Jonathan - thanks for your kind comment and I totally agree with your pro's and con's. The main reason I am re-visiting the Forum (after quite a while) is to try and learn Table Calculations. So in the spirit of learning I've rebuilt the workbook using your table calcs and my simple dataset. I would still struggle to create from scratch but it's good to get a feel for them - so thanks. One thing I can't get my head around is why you use PREVIOUS_VALUE in the '# of Orders for Product Selector' - any Clues?? Many Thanks Ian
You're welcome! If you use just the # of Orders calc that I used in the Associated Products worksheet as a discrete header, it returns Null values and generates extra headers:
We could use WINDOW_SUM(COUNTD([transaction id])) for the number of orders, and that would only return one result per transaction_id, however it would also be executing the WINDOW_SUM calc for every row in each partition and with more than tens of thousands of records performance would start taking a nosedive.
So, the solution is to use Tableau's iterating table function, PREVIOUS_VALUE(). That way the calculation inside the PREVIOUS_VALUE() is executed once for the first row in the partition, and then the result is returned to every other row.
I'm using the techniques from Richard Leeke's awesome work on performance optimization of table calcs in this post:
An alternative to PREVIOUS_VALUE that would get the same results would be to use WINDOW_MAX(), I don't know which one would be faster, I'm guessing PREVIOUS_VALUE. LOOKUP([# of Orders], FIRST()) also will return the same result for every row, and I know that will be slower.
I've attached a workbook demonstrating these.
I've read through the article and it has helped to clear up some of the questions I was puzzling over. I still don't understand why you get the null values - but I can accept the solution(s) and move on.
Thanks again and all the best
I don't like to leave stones un-turned or puzzles un-solved...this has to do with how marks are displayed and there being a finer level of detail in the view than is what is on the Rows and Columns shelves.
The key here is that discrete (blue) pills generate headers - in other words, they create a set of rows/columns for every value that is being returned.
I've set up an example using two calculated fields, WINDOW_SUM([Sales]) and IF FIRST()==0 THEN WINDOW_SUM([Sales]) END, with Category and Sub-Category being the dimensions, and Compute using set to Sub-Category so the partitioning is on Category.
In worksheet 1, we're doing a standard measure names/values text table, with all the dimensions in the view, and can see that the one that does the FIRST() filter is only returning one displayed value, while the other calc returns a value for every row. However, the FIRST() calc is actually returning a whole lot of values, we can see that by looking at the marks info below, that shows 34 marks in 17 rows by 2 columns. It's returning the desired value, and Null for all the other rows.
In worksheet 2, Sub-Category is moved down onto the Level of Detail, duplicating what was happening with the Transaction ID in your workbook above. Because we've explicitly set the Compute Using's, the calculation results don't change, however the plain WS Sales calc is returning overlapping text while the one with the FIRST() filter is also returning overlapping text, except all the values but one in each Category are Null.
In worksheet 3, we've switched back to having Sub-Category on the Rows Shelf, and switched from using Measure Names/Values to showing the calcs as discrete pills on the Rows Shelf, and again we can see they are returning similar results to worksheet 1.
In worksheet 4, Sub-Category is down on the Level of Detail Shelf again. Since WS Sales has the same value for every row, it doesn't generate any extra headers. Since the calc with the FIRST() filter is actually returning two values, the desired value and Null for everything else, it generates two headers, one for the Null and one for the value.
Does that make sense?
thanks for taking the time to put together the explanation (much appreciated) and yes your examples are clear and make sense - it's starting to become clearer but I must confess the lightbulb moment with Table Calculations hasn't happened for me yet. I'll keep tinkering with them and hopefully enlightenment will come ("Zen and the Art of Table Calculations" )
Thanks Again and all the best
PS: Using your lightbulb metaphor, for me learning table calculations has been a whole series of light bulbs, with occasionally some burning out and having to learn things all over again.
Thanks for the links I thought I knew what the Green / Blue pills meant - I didn't. 20 mins into Joe Mako's Think Data Thursday presentation and "PING" First Lightbulb moment. Excellent Feeling. Thanks again.