1 2 3 4 Previous Next 47 Replies Latest reply on Jan 7, 2015 12:54 PM by Alex Cook

    Utilization rates over the course of a day - a variation of the queue length problem.

      Hi everybody!

       

      What I Want To Do

      I want to graph average utilization rates over the course of a day using data with start and end time stamps.

       

      Technical Difficulty

       

      My data is in the following format:

       

      Location-  startTime- endTime - total

      table1- 11:30 - 13:30 - 120 minutes

      table1- 11:45 - 13:15 - 90 minutes

      table1  - 13:05 - 13:20 - 15 minutes

      table2 -...

       

      If I take total minutes as measure and hour as dimension, the most active hour is 11:00:

       

      11:00 - 210 minutes

      12:00 - 0 minutes

      13:00 - 15 minutes

       

      Which is incorrect since 12:00 is the most active hour! This is taken from this thread: Hours in a Day - Displaying 'time worked' and 'dead time'

       

      I attached a screenshot called "perfect" that shows the type of graph I'm interested in, except that it's following the pattern from above (not distributing the hours worked over the time period, but attaching everything to the start).

       

      Or this one: The daily census Workbook (also attached) from this thread:Daily Census (on sheet 2).


      I tried using the solution proposed in this thread: How can you calculate queue length using Tableau? Using the "union all" SQL approach outlined here leaves me with the delta of started (but not yet ended) locations, which is fine for queues, but not terribly helpful for utilization questions.

       

      My Question

      How do I get to a graph showing the average number of tables used over the course of a day (so... 3,3 out of 4 location were used on average between 8 and 9, two out of four between 16.00 and 17.00). In German it's called a Ganglinie, in English a Hydrograph as far as I know. Basically the attached picture called "perfect" or "Ganglinie"

        • 1. Re: Utilization rates over the course of a day - a variation of the queue length problem.
          Alex Kerin

          I'm not sure I have exactly the view/chart you need, but I believe I have the data in the format you need at least.

           

          This is a less than straightforward technique. It's a combination of using the queue union all with delta, and a padding of the data so that we have all hour/date data for every location.

           

          Along with your original data there are two more sheets I had to add to the file - one with times in 10 minute increments for a 24 hour day, and a list of dates. These are used to pad the data out.

           

          First we round your dates to the nearest 10 minutes. We could do this on a minute to minute basis but would make the padded file even larger. 10 minutes should still give you utilization rates per hour that are pretty much correct.

           

          After getting the running sum by location, we need to copy the summary data out (view data, select all, copy) and add it back in as a new dataset. This is now a fully padded file and we the running sum is now hardcoded. You could avoid this step, but it becomes tricky (for me at least) to avoid issues with the partitioning to keep the running sum correct.

          Change the time back to a time format in the new source

           

          The last step is to create an average utilization - this is the sum of the running delta for the hour, divided by 6 (six 10 minute blocks - so someone there for the full hour will have a running sum of 6 for the hour), divided by the number of days where there is data.

           

          You will want to check that my data is correct. One issue I believe is that the division by days will be imprecise if there are days where there are no patients - we will still have padded data for those.

          • 2. Re: Utilization rates over the course of a day - a variation of the queue length problem.
            Alex Kerin

            I think the attached fixes this - we don't need any zero rows for the running sum in the new data source, so the filter corrects the date count.

            • 3. Re: Utilization rates over the course of a day - a variation of the queue length problem.
              Richard Leeke

              You can do this without resorting to copying data with an alternative custom SQL approach.

               

              What you need to do is create a table at the level of granularity that you want for reporting, with one row for each datetime bucket over the period your data spans. I've added two sheets to your sample data, one with datetimes at the level of hour and one at the minuet level, to demonstrate what I mean.

               

              You can then write a custom SQL statement which apportions the duration of each of your data rows over the time buckets.

               

              I've included a few ways to illustrate the occupancy.

              • 5. Re: Utilization rates over the course of a day - a variation of the queue length problem.
                Jonathan Drummey

                Richard,

                 

                Now that I think I have my head wrapped around this solution, do you have any general suggestions as to when this solution (which I'm thinking of as "generate a row for every event/time bucket")  would be preferable - or not - to the union/running_sum solution? The only thing I can think of is maybe if the volumes of the generate a row for every event/time bucket got really huge, but even then all the calcs are aggregate calcs that are faster than the table calcs needed in the other solution.

                 

                Jonathan

                • 6. Re: Utilization rates over the course of a day - a variation of the queue length problem.

                  Wow, there is an extraordinary amount of data-ninja-skill on display in this forum.

                   

                  I'll think through your approaches over the weekend and get back to you when I get it to work (...or don't).

                   

                  Thanks a lot

                   

                  Matthias

                  • 7. Re: Utilization rates over the course of a day - a variation of the queue length problem.
                    Alex Kerin

                    Richard - I suck at SQL - can you walk  me through how the apportioning occurs? I much prefer your method. To your point Jonathan, I don't think Richard's method results in any more lines of data, except in the new table in the datasource. In fact it would be a third of the size within tableau, as there are not separate beg/end/pad rows from the union alls.

                    • 8. Re: Utilization rates over the course of a day - a variation of the queue length problem.
                      Alex Kerin

                      How would you create a utilization by hour of the day (date removed), making sure you divided by the days when there was no utilization?

                      • 9. Re: Utilization rates over the course of a day - a variation of the queue length problem.
                        Richard Leeke

                        @Jonathan: Good question - I'll have a think about that. I haven't used this technique before, I only thought of it when I saw the question.

                         

                        @Alex: I'll have a go at guiding you through the SQL and have a play with your other question when I get to a PC. Too hard to explain with one finger on a phone!

                        • 10. Re: Utilization rates over the course of a day - a variation of the queue length problem.
                          Richard Leeke

                          I'll start with Alex's question about how the SQL apportions each row by time bucket.

                           

                          Take the hourly bucket example. I have created a second table with one row per hour, which I am treating as the start time of a one hour bucket. The SQL is essentially overlaying these hourly buckets over the duration of each occupancy row, to provide hourly occupancy, as this view shows:

                           

                          Hourly Portions.png

                           

                          Here's the SQL:

                           

                          SELECT

                            [Tabelle1$].[Id] AS [Id],

                            [Tabelle1$].[Beginning] AS [Beginning],

                            [Tabelle1$].[Date] AS [Date],

                            [Tabelle1$].[End] AS [End],

                            [Tabelle1$].[Location] AS [Location],

                            [Tabelle2$].[Hour] AS [Hour]

                          FROM [Tabelle1$], [Tabelle2$]

                          WHERE [Beginning] < [Hour] + 1/24

                          AND   [End] >= [Hour]

                           

                          Tabelle1$ is the original occupancy data, Tabelle2$ has just a single column ([Hour]) which is the datetime of the start of the bucket.

                           

                          The crux of the SQL is the WHERE conditions, which basically mean: "join each row of occupancy data to each hour bucket where the occupancy overlaps the hour bucket.

                           

                          Thinking about one row of occupancy data and one hour bucket, there are two conditions which need to be satisfied for them to overlap:

                           

                          1) The beginning of the occupancy must be before the start of the following hour. As [Hour] is a datetime, which is expressed in fractional days, the start of the next hour is just the start of the current hour + 1/24 of a day, so this condition becomes:

                           

                          [Beginning] < [Hour] + 1/24

                           

                          2) The End of the occupancy must be on or after the start of the hour:

                           

                          [End] >= [Hour]

                           

                          (Note that I tweaked the SQL to change the first condition from <= to a < which is really more correct than how I posted the original.)

                           

                          So that gets us one row for each hourly portion of each occupancy period.

                           

                          Then we need to calculate the start time and duration of each portion. I've chosen to do that in Tableau calculated fields, rather than embed the calculations in the custom SQL - it's easier and more flexible. I usually prefer to put as little as possible in the custom SQL.

                           

                          I've restructured those calculations a bit and broken them down into steps which I believe should be fairly self-explanatory. Just look at the calculations for:

                           

                          [Hourly Portion Start Time]

                          [Hourly Portion End Time]

                          [Hourly Utilisation (Days)]

                          [Hourly Utilisation (Minutes)]

                           

                          The key thing to remember with time calculations like these is that datetimes are represented by fractional days - so you need to multiply by 24 to get a duration in hours and by 24 * 60 to get a duration in minutes.

                           

                          Also note that I'm using the form of the MAX() and MIN() functions which compare two values, rather than the more familiar forms which are aggregates over a set of rows. So [Hourly Portion Start Time] is defined as:

                           

                          MAX([Hour], [Beginning])

                           

                          which just means "take whichever is later: [Hour] or [Beginning]" (i.e. if the occupancy started before the start of this bucket, we start from the bucket start time, otherwise we start from the occupancy start time.)

                           

                           

                          I'll ponder your second question @Alex and also yours @Jonathan and post some more thoughts later (if I have any!).

                          • 11. Re: Utilization rates over the course of a day - a variation of the queue length problem.
                            Richard Leeke

                            @Alex: Just want to make sure I understand your second question properly.

                             

                            Are you wanting to calculate the average hourly utilisation, by hour of the day, over a period of many days?

                             

                            Clearly you can calculate the total utilisation by hour of day very easily using this technique, but as you say, it's calculating the denominator correctly when there may be missing days which can be tricky.

                             

                            I can think of three possible approaches to working out the number of days:

                             

                            1) Count the distinct days in the range. That won't work if there are any missing days anywhere in the range.

                            2) Find the difference between the first and last dates in the range (MAX([date]) - MIN([date])). That won't work if there are missing days at the start or end of the range.

                            3) Specify your date range via a start date and end date parameter and calculate the difference between those. That sounds as if it should work.

                             

                            If that doesn't do what you're after, post a sample workbook illustrating ... blaa, blaa, blaa..

                            • 12. Re: Utilization rates over the course of a day - a variation of the queue length problem.
                              Richard Leeke

                              @Alex: Thinking about your follow-up question again, I realised that the average hourly utilisation is probably precisely what Matthias wanted in the first place, so thanks for pointing that out.

                               

                              I've added a sheet which does that using parameters, as I described before.

                               

                              Average Hourly Utilisation.png

                              • 13. Re: Utilization rates over the course of a day - a variation of the queue length problem.
                                Richard Leeke

                                And finally(!) for your question, @Jonathan.

                                 

                                Which approach is best when? As usual, I think the answer is "it depends".

                                 

                                Before I even start to answer that, I'll just throw one more technique which I often use for this sort of thing into the mix. This one is a variant of the "UNION ALL" approach for calculating queue depth, but addresses a problem with that approach. You can think of problems like this in terms of a series of events representing the times when the state changes. These often represent the start and end times of some activity. In between those events, the state is constant. So to visualise the state accurately we need a stepped chart - with horizontal lines between the events and vertical lines at the times of the events.

                                 

                                But using the basic UNION ALL approach we just get the new value at the time of each state change. Joining those dots we get sloping lines, as in the area chart in the first version I posted.

                                 

                                The way around that is to add a couple more clauses to the custom SQL statement, adding dummy "events" immediately before each real event. In the attached version I've included an example of this technique - with rows with a delta of zero, 1 second before each real event (i.e. 1 second before the start and 1 second before the end of each raw data row).

                                 

                                The custom SQL for that looks like this:

                                 

                                SELECT

                                  [Tabelle1$].[Id] AS [Id],

                                  [Tabelle1$].[Beginning] AS [EventTime],

                                  [Tabelle1$].[Location] AS [Location],

                                  1 AS [Delta]

                                FROM [Tabelle1$]

                                UNION ALL

                                SELECT

                                  [Tabelle1$].[Id] AS [Id],

                                  [Tabelle1$].[End] AS [EventTime],

                                  [Tabelle1$].[Location] AS [Location],

                                  -1 AS [Delta]

                                FROM [Tabelle1$]

                                UNION ALL

                                SELECT

                                  [Tabelle1$].[Id] AS [Id],

                                  [Tabelle1$].[Beginning] - 1/86400 AS [EventTime],

                                  [Tabelle1$].[Location] AS [Location],

                                  0 AS [Delta]

                                FROM [Tabelle1$]

                                UNION ALL

                                SELECT

                                  [Tabelle1$].[Id] AS [Id],

                                  [Tabelle1$].[End] - 1/86400 AS [EventTime],

                                  [Tabelle1$].[Location] AS [Location],

                                  0 AS [Delta]

                                FROM [Tabelle1$]

                                 

                                The 1/86400 entries represent 1 second (86,400 seconds in a day).

                                 

                                I've added Dashboard3 to the workbook to compare some of the options:

                                 

                                Continuous Utilisation.png

                                 

                                The Gantt chart at the top is just for reference so you can see what is going on.

                                 

                                The chart "Hourly Utilisation (Area)" uses the time bucket technique I posted yesterday. It shows the same problem with sloping lines that I mentioned with the basic queue depth approach (I haven't bothered to show one of those). If you want hourly totals or averages, I think it's much better to use a bar chart, as per the average example I posted earlier.

                                 

                                "Utilisation per Minute" gives a much more correct, detailed view of the utilisation. That version is produced by using a far more fine-grained set of time buckets. If detail is what you want, this approach gives a good representation - but it may not scale well - depending on the accuracy you want and the time range. To get one minute accuracy you need a row per minute in the time bucket table and you end up multiplying the number of result rows by the average duration of your data rows in minutes. 100 fold increase if your average duration is 1 hour 40 minutes.

                                 

                                The final chart "Continuous Overall Utilisation" is the most accurate (I sometimes subtract 1/86,400,000 for millisecond accuracy for the analysis I do). That approach results in 4 rows per raw data row.

                                 

                                As you mentioned, the UNION ALL approach uses table calculations, so might get a bit slow with very large data volumes (though RUNNING_SUM() isn't too bad).

                                 

                                In summary, I think the bucket approach works well for aggregated reporting whereas the UNION ALL, queue depth approach probably works better for detailed reporting. With large data volumes it's probably worth doing the sums to see how many rows you will be generating / pulling back to Tableau with each approach.

                                 

                                Well that's been a bit of fun...

                                • 14. Re: Utilization rates over the course of a day - a variation of the queue length problem.
                                  Alex Kerin

                                  I've used the "second to create a step" before (thanks to you). I did notice that from Excel at least, the date had to be formatted as a number or it would fail. Someone (I can't find the forum post obviously) suggested using dateadd instead in the custom sql - it worked really nicely.

                                  1 2 3 4 Previous Next