7 Replies Latest reply on May 23, 2018 8:04 PM by swaroop.gantela

    Binary Aggregation and Joins

    Chris Latimer

      I've been trying to figure out how can one join multiple tables together with differing levels of granularity while still maintaining the correct result from binary aggregation. By binary aggregation I mean columns that hold 0's and 1's whose sums and other functions are in some way meaningful.

       

      I have made a sample workbook demonstrating the problem. The use case:

      • There are four staff members numbered 1 to 4
      • There are three customers numbered 1 to 3
      • Each interaction between a customer and a staffer results in some arbitrary 0 or 1 in 'customer denominator' and 'customer numerator'
      • The division of 'customer numerator' and 'customer denominator' results in a staffer's 'engagement rate'
      • The correct engagement rates are as follows
        • 1: 75%, 2: 50%, 3: 100%, and 4: 0%
        • These numbers can be seen when there is no join between the two tables

       

      When one join's the customer events table into the data schema, the differences in granularity overstate certain rows and understate other rows resulting in the wrong answers to be displayed for staffer's engagement rates.

       

      So my question is: How can I have tables of differing levels of granularity and still be able to calculate metrics that require binary aggregation? Please note that grouping to the same level of granularity isn't an option as both (actually 3+) levels are needed within some visuals.

        • 1. Re: Binary Aggregation and Joins
          swaroop.gantela

          Chris,

           

          Here's a first pass at it.

           

          For the numerator for a particular combination of Staff and Customer:

          sum the numerators, divide by the number of events, and fix that to

          the combination of Staff and Customer :

          { FIXED [Staff Id],[Customer Id]:SUM([Customer numerator])/COUNTD([Event id])}

           

          The same holds for the denominator.

          Then your percentage will be

          SUM([Numerator LOD] )/SUM([Denominator LOD])

           

          Please see workbook attached in the Forum thread.

          1 of 1 people found this helpful
          • 2. Re: Binary Aggregation and Joins
            Chris Latimer

            Thank you very much for this solution, it works perfectly for the use case described!

             

            My followup is how can this scale?

            • This data source only dealt with two levels of granularity, but how can you solve this problem for three levels of granularity?
            • This type of function is very memory/time intensive and doesn't appear to be be compatible with 10m+ row datasets, is there anyway to improve performance of this?

             

            Thanks for the answer!

            • 3. Re: Binary Aggregation and Joins
              swaroop.gantela

              Chris,

               

              In some settings, the Level of Detail calculations may be less efficient:

              Create Efficient Calculations

               

              Here is a reworking of the calculations as Table Calculations.
              Let's first see if that improves the processing.

              Please note, to get this to work, there are quite particular settings for the "Compute Using"

              of each Tableau Calculation and its nested parts.

               

              I haven't yet looked into the levels of granularity for this task.

              The problem may arise with the need to statically define the "Compute Using"s.

               

              Please describe the granularity levels. Are there levels above Staff like Team A/B, then further Company 1/2?

              Please adjust the attached xlsx to more closely match your hierarchy levels.

               

              Also, how do you foresee the user changing the level of granularity:

              -by clicking on the +/- marks in a hierarchy header, or

              -by selecting a parameter

              • 4. Re: Binary Aggregation and Joins
                Chris Latimer

                Thank you again for this response, I'll try and provide as much detail as relevant to help get to the right answer.

                 

                Our base level of granularity is a customer with some levels being aggregations of customers and some being deeper than customers such that the former have multiple customers per level and the latter have multiple events per customer.

                1. Higher levels of granularity (Many customers to one of each): Company hierarchy (3 levels)> Staff> Customer
                  1. Ex. Company 1, Region A, Office A > Staff > Customer
                2. Levels of granularity below customers (One customer to many of each): events, surveys, engagements, logins, etc such that there are multiple of each of these types of events per customer

                Our problem resides with having multiple tables with a granularity below that of customers that results in duplication of customer data and disrupts boolean and binary calculations. In my attached use case in the question, the problem is shown when combining customer engagement with customer events. The problem is further exaggerated when adding in some arbitrary list of surveys per customer thus resulting in even greater duplication.

                 

                The questions is, how can we have multiple, unrelated levels of granularity below customer and still use boolean and binary aggregation?

                 

                The use case of our users is not to utilize +/- handles within headers, but to be able to have a calculation of the average engagement (numerator/denominator) in the same view as one for the sum of events, sum of surveys, etc.

                • 5. Re: Binary Aggregation and Joins
                  swaroop.gantela

                  Chris,

                   

                  Sorry for the iterations.

                  I hope this one is closer.


                  I think this will simplify the matter quite a bit and

                  should be less computationally intense.

                   

                  Using the ideas from this article:

                  https://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#joining_tables.html#troubleshoot

                   

                  I think it's just a matter of taking the Min of the CustNum & CustDenom no matter how many

                  levels below customer get added. (In the attached xlsx, I added a survey level).

                   

                  Then I made separate calculations for each level of the hierarchy,

                  all of which were the same WINDOW_SUM, just a setting of different levels of Compute Using.

                   

                  OF NOTE: because the Staff-Cust table had rows without a unique identifier

                  (for example StaffID 1 + Cust 3 there are three different rows). I think I needed to add a Row identifier to

                  help with the calculations.

                   

                  270657join.png

                  • 6. Re: Binary Aggregation and Joins
                    Chris Latimer

                    This answer also works for the use case described, thank you very much! I'm trying not to look a gift horse in the mouth, but these table calculations are most likely too difficult for some of our internal users to navigate/replicate. Are there any methods that we can pursue that could achieve the same ends besides pre-aggregation?

                    • 7. Re: Binary Aggregation and Joins
                      swaroop.gantela

                      Chris,

                       

                      I think some method of aggregation is going to be necessary on account of the duplicates created

                      by the process of joining to sheets at granularities below Customer.

                      Because the above link recommends using something like MIN([Customer Numerator]) to avoid the duplicates,

                      there will need to be some method to SUM those MINs, which will be either Level of Detail calculations or Table Calculations.

                       

                      The LODs can be computationally expensive, while the Table Calculations can get complicated, especially when there are nested calculations.

                      For this particular scenario, the Setup for the all the Table Calcs is the same; the only part that changes

                      is the "Restarting Every", which should reflect directly the granularity of calculations (Staff/Team/Region/etc.)

                      (please see screenshot below).

                       

                      270657joinB.png