1 Reply Latest reply on Dec 7, 2011 5:25 PM by Richard Leeke

    Need some form of Lookup... (a challenge for the gurus)

      Okay... this is a tough one. I consider myself to be pretty savvy with calcs, but I'm stumped. Hopefully someone can show me where I'm missing something obvious...

       

      Here's the challenge:

      We deal with retail sales data. Products selling in stores. Each store belongs to a sales region, and each region has a certain number of stores in the region. One of the facts our clients want to know is "What percent of the stores carry this (product, product line, size, etc).

       

      "Why is this difficult?" you may ask.

      The basic calculation is quite simple... (Count of stores in that cell) / (total possible stores)

       

      The logic remains simple in these two situations:

      In what % of stores am I selling out of the whole universe of stores?... this is pretty easy, as I can pre-build a field that contains the "Total Possible Stores".

      In what % of stores am I selling in a single region of stores... this is also pretty easy, as I can populate a field with the "Total Stores in the Region".

       

      Where this gets exceedingly tricky:

      What if there are 12 regions, but I have used a quick filter to narrow the displayed data to just three regions... and each region may contain a different number of stores:

          RM (Rocky Mountain): 27 stores

          SP (South Pacific): 44 stores

          PN (Pac Northwest): 17 stores

       

      Now I have a real challenge... I need to answer the question: In what % of stores am I selling out of the total stores in RM, SP, and PN?

      I need to get the sum of possible stores (27+44+17 = 88). Ideally I want to get the distinct region names, and then do a lookup on them... but I haven't found an elegant way to do that.

       

      My super slow solution:

      I've found a hacky solution that works... but it's so slow that we really can't use it with clients:

      ( 6 * COUNTD(IF [Stores In Region]=6 THEN [Region Name] END))

      + (7 * COUNTD(IF [Stores In Region]=7 THEN [Region Name] END))

      + (8 * COUNTD(IF [Stores In Region]=8 THEN [Region Name] END))

      + (9 * COUNTD(IF [Stores In Region]=9 THEN [Region Name] END))

          Repeat until you reach the number 50, which is the most stores that a region is likely to have.

       

      I've attached a packaged workbook to illustrate this challenge. The workbook contains a fairly small data set... but it runs slowly enough that you'll be able to see the performance challenge. Worksheet 1 has only one level of row headers and runs reasonably quickly. Worksheet 2 has two levels of row headers and is quite slow (I left it in to illustrate the performance issue with this smallish data set). The row headers are a hierarchy (Level 1 through Level 4) and they are populated based on the Level 1, Level 2, Level 3, Level 4 parameter controls.

        • 1. Re: Need some form of Lookup... (a challenge for the gurus)
          Richard Leeke

          I don't quite understand the data - I'm guessing you're using some sort of magic to include derived values in your resultset in the rows with fractional store IDs and the empty row flag set.

           

          So in the attached example I just filtered out all the NULL and fractional store IDs and did the calculations on the whole number store IDs.  Even if that's not what you want, it demonstrates how to use table calculations to do what I think you're after.  You'l see that the view refreshes essentially instantly as you expand and collapse the hierarchy.

           

          The key to this is the TOTAL() table calculation.  This function essentially evaluates the contained expression with a different level of aggregation to that defined by the layout of the view.  So that allows you to calculate the overall distinct count of stores in the region.

           

          All calculations are being done in the data engine, which is why it's so quick (other table calculations happen in Tableau on returned results but TOTAL() is done in the database).