2 Replies Latest reply on Apr 3, 2012 12:56 PM by Piers Chamberlain

    Start /Stop events split by inactivity

    Piers Chamberlain

      Hi all.


      I'm trying to get a view of user concurrency using IIS log files.

      I have a record of date time interactions and the usernames for each.

      I have a good Gannt chart showing user activity start stop using MIN / MAX (datetimes), however for result sets across multiple hours /days this becomes unrepresentative, as they probably stepped away for decent intervals to sleep etc.


      What I'd like to do is set an inactivity period (parameter etc?), and if there is gap of longer than this, start a new "session". I'm thinking that this will likely rely heavily on Window Calcs, but frankly they scare me. I could process the data offline to do this also, but I'd prefer not to, just for ease of plugging new data into the workbook, which I do quite a bit.


      For example if I have:

      User1 2012-02-26 13:00:00

      User1 2012-02-26 13:10:00

      User1 2012-02-27 11:00:00

      User1 2012-02-27 11:10:00

      User1 2012-02-27 11:15:00


      I'd be represented as 2 distinct Gannt Bars for User 1, one for each day, the first 10 mins long and the second 15 if that makes sense.


      Any tricks here that have worked for people? Can I somehow create an identifier in a Calculated field and use this as a session marker if the previous timestamp for that user is over the defined threshold?



        • 1. Re: Start /Stop events split by inactivity
          Richard Leeke

          Hi Piers


          3 ways that I can think of:


          1) Table calcs, in much the way that you describe. Non-trivial (as is always the way with table calcs) and won't scale to really large data sets. You could probably do a few tens of thousands of rows.


          2) Process the logs before loading into a database. Downside of that approach is that you lock in the value of your timeout parameter - but there's no reason not to add incremental data to your database (even to a data extract) with this approach.


          3) Depending on your datasource, you may be able to do this quite efficiently with database window functions. I don't know much about them, though.


          I generally use method 2) driven from a perl script. Here's a view of 3,500 sessions from a total of 100,000 requests from a day's log done that way. Colour represents number of requests in the session - you can just see a few red sessions where people were very busy (or there was a javascript bug!).




          Hope that helps.



          • 2. Re: Start /Stop events split by inactivity
            Piers Chamberlain

            Thanks Richard -


            I'll avoid any attempt at option 1 because the logs I'm chewing on are typically in the 100K - 10M lines range.

            I'm using flat file data sources, so that rules out 3.

            Option 2 is the go. I had hoped to avoid any intermediate parsing steps, but you're right, of course, it wont prevent reloading - and I also realise that I have previously written most of the logic for other projects (yay!).


            Much obliged