4 Replies Latest reply on Aug 11, 2015 10:26 AM by Jonathan Drummey

# Running total across columns

Calling all tableau geniuses!

I have a question regarding running totals which I was hoping you might be able to help me with.

Please find attached a package workbook containing some anonymized data.

I have three manufacturers, each you produce a range of standardised products, some available in different formats.

Over a period of time, I have a large number of customers who have purchased various products from the manufacturers.

I would like to analyse the cross purchase relationship between the different products.

For example, do the customers who purchase product 1, type 1 from manufacturer 1 subsequently purchase a disproportionate amount of product 5, type 11 from manufacturer 3?

One way I thought I could do this is to:

sort product 1, type 1 largest to smallest vertically

calculate the running total, and the percentage of total running down table

where product 1, type 1 hits 100%, compare percentage of revenue for other manufacturers/products

this would tell you that the customers that purchased 100% of product 1, type 1, also bought 15.3% of product 5, type 15 from manufacturer 1 or 10.79% of product 1, type 1 from manufacturer 2.

I would like to set this up in a table to compare easily, but I find I have to rework the above procedure for every product type. There must be a way to use the parameter controls to compare one product against another without having to start over each time. Any ideas?

I would like to present this data graphically, possibly with some running total Pareto type charts where one can easily identify the products that seem to have a relationship between manufacturers.

See sheet 3: my trouble is that I would like to be able to cycle through which product is "taken to 100%" to compare against the other products. I would also like to "cut it off" at 100%, to prevent the long tails.

Any help the community can offer on this little problem?

Very much appreciated!

Best wishes,

Jonathan

• ###### 1. Re: Running total across columns

Hi Jonathan,

I don't understand why you have Customer as part of the view because it seems like you're really trying to compare across product/type/manufacturer.

Going to your question of "For example, do the customers who purchase product 1, type 1 from manufacturer 1 subsequently purchase a disproportionate amount of product 5, type 11 from manufacturer 3?" it doesn't seem like you really need Customer in the views that you build, because the real question is about the relative amounts of product/type/mfr for all customers who purchased a given product/type/mfr.

Therefore, I came up with the following view. Here we can see that customers who purchased Product 2/Type2/Mfr 3 really like to buy more from Mfr 3 (as a % of all purchases):

And customers who buy Product 1/Type 1/Mfr 1 don't buy much of anything:

This avoids the problem with a running sum where we have to mentally compare the heights vertical or diagonal lines at different axis positions to find what other products have a large contribution, instead we can use color as a preattentive attribute.

A quick overview of how I built this:

1. Created a calculated field that combines Product/Type/Mfr.

2. Built a parameter list based on that field. (Unfortunately, parameters are not dynamically updated at this time so you'd have to rebuild this parameter every time the list of products changes).***

3. Created a "Cust for Selected Product" conditional Set with the following formula: SUM(IF [Product/Type/Mfr] == [Product/Type/Mfr Parameter] THEN 1 ELSE 0 END) > 0. This makes the IN of the set only those customers that have chosen the given product.

Here's how I validated the Set, I created a view that had all the dimensions in it (Product, Type, Mfr, and Customer) then set up the IN/OUT of the Set on Color and Columns. There aren't any crosses of grey marks (OUT) where there should be blue (IN) marks and vice versa:

4. Next I built a view with the Mfr, Product & Type dimensions to lay out cells in a useful way, and put the Cust for Selected Products Set on the Level of Detail Shelf. This makes a mark for each combo of the dimensions, SUM(Revenue) on Color colors each mark by the amount of revenue, and it seemed like you were more interested in the relative revenue so I changed that to a % of total with a Compute Using on the Cust for Selected Products Set. This makes each mark (the square) have the % of total for the customers IN the set vs. OUT of the set:

5. The next step is to get rid of the Out marks without changing the % of total. If we just filter out the Out marks then we'll break the % of total, there are at least three ways around this:

a) Add the Set to the Rows Shelf, click on Out and choose Hide. This is the fastest but it ends up with a view where there's stuff going on that we can't see in the Tableau Desktop interface (the only way to find the hidden marks is to click on each pill to see which one(s) have "Show Hidden Data" as a context menu option). You can see this in the "set view via hide" worksheet.

b) Create a table calculation filter which is then applied after the % of total is computed. Since we're using a Set and can't aggregate Sets, the Set has to be embedded inside a record-level calculation which can then be aggregated and surrounded by a LOOKUP() calc to turn it into a table calculation filter, I did this in the Set view via TC filter worksheet.

c) Skip using Sets entirely and use FIXED LOD expressions to effectively do the IN/OUT of the Set and compute the % of total as well. I didn't prepare a version of this.

I set the color ramp on the % of total to have 5 even breaks, there are a variety of ways that could be configured.

*** These same views could be built using a Tableau data blend to create a "dynamic" parameter but they would be a lot more work and not as performant, see Creating a Dynamic “Parameter” with a Tableau Data Blend | Drawing with Numbers for more info.

One other factor here is that in your initial description you had the word "subsequently" yet there is no time dimension in the data. In order to restrict only following purchases in the data I'd suggest leveraging LOD expressions in the following way. Given a purchase date, this calculation only returns True for purchases on or after the first purchase date by a customer for the selected product:

[Purchase Date] >= {FIXED [Customer] : MIN(IF [Product/Type/Mfr] == [Product/Type/Mfr Parameter] THEN [Purchase Date] END)}

If you're looking for something different, let me know!

Jonathan

• ###### 2. Re: Running total across columns

Haha... Too late for me to accept this challenge, but wow what an answer!

Cool!

• ###### 3. Re: Running total across columns

Jonathan,

This is quite the most amazing answer! I'm eternally grateful!

The way you created the dynamic sets-genius!

Rocket scientist!

Thank you, thank you, thank you!

Pint of Guinness on me any time!

Best wishes,

Jonathan

• ###### 4. Re: Running total across columns

You're welcome! And yes, I would very much enjoy a pint of Guinness should we get the chance!

Jonathan