Okay... this is a tough one. I consider myself to be pretty savvy with calcs, but I'm stumped. Hopefully someone can show me where I'm missing something obvious...
Here's the challenge:
We deal with retail sales data. Products selling in stores. Each store belongs to a sales region, and each region has a certain number of stores in the region. One of the facts our clients want to know is "What percent of the stores carry this (product, product line, size, etc).
"Why is this difficult?" you may ask.
The basic calculation is quite simple... (Count of stores in that cell) / (total possible stores)
The logic remains simple in these two situations:
In what % of stores am I selling out of the whole universe of stores?... this is pretty easy, as I can pre-build a field that contains the "Total Possible Stores".
In what % of stores am I selling in a single region of stores... this is also pretty easy, as I can populate a field with the "Total Stores in the Region".
Where this gets exceedingly tricky:
What if there are 12 regions, but I have used a quick filter to narrow the displayed data to just three regions... and each region may contain a different number of stores:
RM (Rocky Mountain): 27 stores
SP (South Pacific): 44 stores
PN (Pac Northwest): 17 stores
Now I have a real challenge... I need to answer the question: In what % of stores am I selling out of the total stores in RM, SP, and PN?
I need to get the sum of possible stores (27+44+17 = 88). Ideally I want to get the distinct region names, and then do a lookup on them... but I haven't found an elegant way to do that.
My super slow solution:
I've found a hacky solution that works... but it's so slow that we really can't use it with clients:
( 6 * COUNTD(IF [Stores In Region]=6 THEN [Region Name] END))
+ (7 * COUNTD(IF [Stores In Region]=7 THEN [Region Name] END))
+ (8 * COUNTD(IF [Stores In Region]=8 THEN [Region Name] END))
+ (9 * COUNTD(IF [Stores In Region]=9 THEN [Region Name] END))
Repeat until you reach the number 50, which is the most stores that a region is likely to have.
I've attached a packaged workbook to illustrate this challenge. The workbook contains a fairly small data set... but it runs slowly enough that you'll be able to see the performance challenge. Worksheet 1 has only one level of row headers and runs reasonably quickly. Worksheet 2 has two levels of row headers and is quite slow (I left it in to illustrate the performance issue with this smallish data set). The row headers are a hierarchy (Level 1 through Level 4) and they are populated based on the Level 1, Level 2, Level 3, Level 4 parameter controls.
example_for_tableau.twbx 126.5 KB