2 Replies Latest reply on Feb 22, 2012 10:47 PM by ivanklimovich

    Match field against itself using 2nd variable

      Hi there,


      I've got transactional data set which goes back to 2010. It contains ~32k unique product_ids. I want to compare 20% of top selling products against the whole 32k to identify the best selling pairs of goods. I have 2 questions:


      As I see it:


      I'm trying to fill the matrix which will have 6.5k rows and 32k columns and each cell will represent the count of instances when order_id for horizontal axis Product_id equals vertical axis Product_id.


      After that I would like to return product_ids of pairs of  each 6.5k products  in descending order.


      Don't know where to start. Just start playing with tool....


      P.S. It looks like I can't apply filte ronly for Vertical axis if I have product_ids in both vertical and horizontal?

        • 1. Re: Match field against itself using 2nd variable
          Jonathan Drummey



          What you're looking to do is have Tableau plot a table with 6.5k * 32k = 208 million cells, I don't think that will be usable nor particularly feasible. I recently read something about Andy Cotgreave demonstrating a viz with somewhere between 1 and 9 million marks that took 3 minutes to display, I have no idea how long 208 million would take, (nor how much memory Tableau would try to use), never mind how long it would take to visually scan the table list. I think not matter way you go you're likely to have to do some further segmentation of the 32k products to make this work.


          There are two options for working with this data in Tableau. One is table calculations, the second is to do a cross product aka Cartesian join to get the combinations. Given your volume of data, I'd suggest doing the latter, and do most of the prep work in your database. The reason why is that your requirements as stated are mostly about how the data is selected and organized, and not so much about needing to do visual analysis on it. Since DBs are designed to sort, filter, and query huge volumes of data it makes sense to do the heavy lifting there. For example, making the list of the top 6.5k products would be one query, that could feed a table that would be used for the cross product query, and then the cross product table could be sorted and filtered by a third query, and it's that data that gets output to Tableau.


          Here are links to a few discussions to get you started:


          Working with large data sets and self-joins:




          Displaying self-join results in a crosstab, allowing for filtering:


          And here's a place to start with table calculations, if you go that route:


          The Tableau videos are also great.



          • 2. Re: Match field against itself using 2nd variable

            Thanks, Jonathan. I think you are right. I've already switched to db option.