7 Replies Latest reply on Apr 21, 2008 5:39 PM by Richard Leeke

    How to combine expressions with different levels of aggregation

    Richard Leeke

      I am trying to analyse staff utilisation rates from timesheet data.


      My raw data is a table where each row represents the number of hours spent by one staff member on one activity on a given date.


      I am trying to visualise application rate over time, broken down in various ways.  I want time summarised on the columns shelf - (week, month, quarter or year), some grouping of staff members on the rows shelf (individual staff members, teams, ad-hoc groups, etc) and various classifying attributes on the colours shelf (project, customer, type of work, etc).


      I can easily show total hours in this structure, but what I really want to show is "application rate" - which is defined as the total "hours worked" divided by the total "available hours" in the period.


      "Available hours" is defined as 7.5 (nominal hours per day) times the number of workable days in the period (i.e. number of days each staff member has been employed in the period, excluding weekends, public holidays and time on leave without pay).  The "available hours" calculation is a bit tricky, but I have managed to get that to work on it's own as long as I don't have any classification on the colours shelf (details below).  What I can't figure out how to do is an expression to give "application rate" as a measure - because the available hours aggregate needs to use the rows and columns shelves (i.e. to aggregate over the right time range and grouping of staff) but ignore the colours shelf (since it needs access to all entries per staff member per day - not just those for the particular classification).


      I have been trying to work out a way to derive this either using custom SQL in the datasource or raw sql in an expression - but I think I need to be able to pass sheet context information (i.e. the rows and columns shelf contents) for this to work.


      The other way I have been experimenting with is to calculate the numerator and the denominator independently and then combine them - but the only way I can see of doing that involves exporting the results of both at some particular level of granularity and then creating a new data source to join them (or even do that in Excel!) - but that approach completely loses the ability to drill-down dynamically.


      Seeing Erin's posting the other day about using RAW_SQL to get at the underlying data using "Sheet1$" to reference the underlying data source, I was just wondering if there are any other hidden tricks for getting at worksheet context information.  Alternatively, any other suggestions on how to go about the problem would be good.



      It might help to understand how I'm calculating available hours:


      [Available Hours = (COUNTD([person_weekday]) - COUNTD([person_nonworkday])) * 7.5




      [person_weekday] =

      IF (([Day of Week] >= 2) AND ([Day of Week] <= 6)) THEN

        STR([Timesheet Item - Date]) + ", " + [Timesheet - Employee Number]







      [person_nonworkday] =

      IF ((([Day of Week] >= 2) AND ([Day of Week] <= 6))

      AND (([Timesheet Item - Activity Type] = 'Leave Without Pay') OR ([Timesheet Item - Activity Type] = 'Public Holiday'))) THEN

        STR([Timesheet Item - Date]) + ", " + [Timesheet - Employee Number]





      So the available hours calculation needs all rows for relevant employees in the period, whereas the hours worked needs to be broken down by whatever classification is being explored (customer, type of work, etc).



      Anybody got any hints on possible ways to attack this?

        • 1. Re: How to combine expressions with different levels of aggregation
          Dirk Karis

          If I understand your schema correctly, your timesheet table looks something like this


          EmpNum Date Hours ActivityType Project Customer TypeOfWork


          (I took the liberty of abbreviating column names).

          It sounds like the two things that make analysis difficult are:

          (1) Hours for a given day may add up to less than 7.5,

          (2) Non-billable hours have NULL Project/Customer/Type fields.


          You're trying to

          - create the hours that are missing from (1)

          - "charge" non-billable hours to a specific Project/Customer/Type for the purpose of computing application rate.


          If an employee works on more than project at a time, this is very difficult to do -- which project should be charged for the missing hours?


          If there was an AssignedProject table with fields like this

          EmpNo Project BeginDate EndDate


          you could add it to the connection by using the custom join

          Timesheet.EmpNo = AssignedProject.EmpNo AND

          Timesheet.Date >= AssignedProject.BeginDate AND

          Timesheet.Date <= AssignedProject.EndDate


          (if there was an entry in AssignedProject for each day, the join gets even simpler).


          then you could create this calculated field

          [AssumedProject] =

          IF ISNULL([Timesheet - Project]) THEN

              [AssignedProject  - Project]


              [Timesheet - Project]



          Filtering on [AssumedProject] rather than [Timesheet - Project] would then work as you expect.


          In the absence of an AssignedProject table, I think you're going to have to edit the connection to use completely custom SQL.  Exactly how to edit it depends on what rules you want to apply to guess at which project to charge uncredited time against.

          • 2. Re: How to combine expressions with different levels of aggregation
            Richard Leeke

            Thanks for the response - though I don't think I managed to explain my issue very clearly, so this doesn't really solve it.  I should have used a simplified schema like you did to make the problem more concrete. I'll try to explain it using an even more simplified schema.


            I think there is one key thing which makes the analysis difficult, with several extra complications to add to the fun.  The key thing is:


            1) There is no available table of nominal working days (the data is exported from an application package which does not record this), so this needs to be derived from actual time booked.


            Added difficulty because:


            2) Hours booked by an employee may be more or less than the nominal 7.5 per day.


            3) Employees may book time on days which are not nominal working days (weekends, public holidays etc).


            4) Nominal working days may differ between staff members (different regions have different public holidays, staff may be on leave without pay, etc).


            Non-billable hours do in fact always have project/customer/type entries, by the way.


            I am trying to report the proportion of "available hours" which are actually spent on billable work, which we refer to as application rate.  This is a key performance indicator, since costs depend on available hours and revenue depends on billable hours.


            I want to find a way of defining a calculated field for application rate which is a well-behaved measure as I drill-down by the various classifications I want to explore.


            Assume we have data for two employees for two days, classifed as Billable (B) or Non-Billable (N).  Employee 1 does billable work for 2 customers on both days.  Employee 2 does billable work on the first day and non billable work the second day.  And let's have 8 hour days to make the sums simpler.


            Emp Date Hours Type Cust

            1 1/4/08 4.0 B C1

            1 1/4/08 4.0 B C2

            2 1/4/08 8.0 B C3

            1 2/4/08 4.0 B C1

            1 2/4/08 4.0 B C2

            2 2/4/08 8.0 U C3


            Ignoring the complications of weekends, public holidays etc, we can say that the available hours in this period is simply 8 times the number of distinct "employee days" - in exactly the same way as you did in your response to the "Aggregations by Day" post from someone else a couple of days ago.


            So we end up with:


            [Application Rate] = SUM([Hours]) / (8 * COUNTD( STR([Date]) + ',' + STR([Emp])))


            Evaluating that expression over the entire data set we get:


            32 / (8 * 4) = 32 / 32 = 100%


            i.e. employees have been 100% applied to some type of work in the period.  Correct.


            Now let's start dragging classifying fields onto the colour shelf.  First the Type field.  This gives us the application rate for Billed and Unbilled, as follows:


            B = 24 / (8 * 3) = 24 / 24 = 100%

            U = 8 / (8 * 1) = 8 / 8 = 100%




            Clearly all staff were not 100% applied to billed work and 100% applied to unbilled work.  What we wanted was:


            B = 24 / (8 * 4) = 24 / 32 = 75%

            U = 8 / (8 * 4) = 8 / 32 = 25%


            Hopefully that long-winded example makes my problem clear. 


            The reason for the wrong answer is obvious.  The denominator is trying to calculate available hours based on the number of employees who booked time on each day - but the expression is only operating over a subset of the records.  Only one employee booked any billable time on 2/4/08 - so the total available person days for billable work appears to be 3 not 4.  Likewise there only appears to be 1 person day available for unbilled work.


            However I can't just define the available hours calculation by a RAWSQL expression over the whole table, because it does need to be filtered by date and employee, to allow analysis by period and employee groupings.  This is what I meant by needing to combine expressions with different levels of aggregation.



            I've attached a packaged workbook with the sample data I used above, showing the behaviour of the simple (and wrong) calculated field.



            Thinking about this some more since I originally posted, I had come to the same conclusion as you - namely that I am going to need to use custom SQL for the connection.  I think I will need to include some sort of indication of available days within each row of the result set - probably by joining to a derived table based on some aggregate by employee and date.  I'm hoping Firebird supports a mechanism for doing that on the fly - but failing that I suppose I may need to import my raw data, derive the "employee available days", export that as a text file and then define a connection joining the two text files.


            [As an aside - is there a trick for exporting results in CSV format? I've only managed to save to a tab delimited text file which isn't supported as an input type, so needs to be edited first.]


            So I shall pursue that route - but if you have any helpful tips to speed me on my voyage of discovery I'd be most grateful.


            Apologies for the length of the post - but I think this must be quite a common class of problem, so it would be good to understand how to attack it.

            • 3. Re: How to combine expressions with different levels of aggregation
              Austin Dahl

              I wonder if you could use a table calculation.  It lets you do sums over a row, column or pane.  It's also pretty easy to do percentage of row, column or pane too.

              • 4. Re: How to combine expressions with different levels of aggregation
                Austin Dahl



                I'm attaching your example workbook with two new sheets that use a table calculation showing the percent total over type.  It may only work in this simplified version of  your data, but I hope there's something useful for you.  Right click on the hours field on the text shelf and select "Edit Table Calculation..." to take a look at the settings for the table calc that I used.



                • 5. Re: How to combine expressions with different levels of aggregation
                  Dirk Karis

                  I think there's two notions of available hours and of application rate that you need to distinguish.


                  One is worked hours/paid hours and the other is billed hours/worked hours.


                  [Paid Hours] is a per employee calculation.  It will be correct in any case where an employee is a member of exactly one group -- either by themself, or a summary by department or country or whatever.


                  It will be incorrect if you put customer or project on a filter shelf.  It will be very difficult to fix this without some notion of employee assignment.  If an employee works for two customers every day in a week, is paid for 40 hours in a week, but submits timesheets totalling only 35 hours (billed and unbilled), which customer gets "charged" for his slacking off?


                  [Paid Hours] =

                  8 * COUNTD( STR([DATE] + ';' + STR([Emp]))


                  In complicated form it looks like


                  [Workday] =

                  [Day of Week] >= 2 AND [Day of Week] <= 6 AND [Activity Type] <> 'Unpaid' AND [Activity Type] <> 'Holiday'


                  [Paid Hours] =

                  8 * COUNTD(IIF([Workday], STR([DATE] + ';' + STR([Emp]), NULL)


                  [Worked hours] is

                  Sum(IIF([Activity] <> 'Unpaid' AND [Activity] <> 'Holiday', [Hours], 0))


                  [Worked Hours pct of Paid] =

                  [Worked Hours]/[Paid Hours]


                  This is a measure of how much your employees are doing for free (or slacking off).  I would think there would be a constant target for this.  This going to 200% isn't a sign of a well-run department, it's a sign of employee burnout.



                  To count billed hours, without putting Type on the filter shelf, just use IIF


                  [Billed Hours] =

                  Sum([IIF([Type] = 'B', [Hours], 0))


                  Now you can compute

                  [Billed Hours Pct Of Worked]

                  [Billed Hours]/[Worked Hours]


                  This computation IS valid even if an employee falls into more than one group.  It should work fine for breakdowns by customer or project.


                  You can compute

                  [Billed Hours]/[Paid Hours] for an an employee or a department, but you can't compute it for a customer or a project without a notion of assignment.

                  • 6. Re: How to combine expressions with different levels of aggregation
                    Richard Leeke

                    Thanks for the suggestion, Austin.


                    I had experimented with table calculations - but I can't see any way to achieve what I'm really after - which is to express various classifications of actual hours as a percentage of the "nominal available" hours, rather than just the total actually recorded.


                    Our time recording package does exactly the same thing when reporting application rates - and it is definitely misleading.  If I do 8 hours billed work and another 4 hours unbilled (writing a proposal, say) - that would be reported as 67% applied to billed work and 100% applied overall.  How I would like to see that reported is 100% billed and 150% overall - which highlights 2 things: profitability is fine - but I'm working too much.

                    • 7. Re: How to combine expressions with different levels of aggregation
                      Richard Leeke

                      Thanks for the thoughts, Dirk.


                      It's not a question of "slacking off" - there can be all sorts of perfectly valid (and planned) reasons for staff to be booking time to non billed work - attending training courses, writing proposals, etc, etc. 


                      You're quite right that the worked versus paid ratio is worth monitoring, and the target there is simple - 100% for the reasons you mention - though with the expectatation of "swings and roundabouts" due to trying to meet fluctuations in customer demand.  That's pretty easy to track, as you say.


                      The reason that I'm focussed on the application rate (or [Billed Hours]/[Paid Hours] as you have it) is that this is a really important KPI - it's an indicator of revenue/costs - and it would be really valuable to be able to do Tableau-style drill-downs on that.


                      Anyway - I've spent way too long trying to get this going.  I'll just resort to doing the calculation for some pre-determined breakdowns.


                      Thanks for your help.