-
1. Re: Start /Stop events split by inactivity
Richard Leeke Apr 3, 2012 1:36 AM (in response to Piers Chamberlain)Hi Piers
3 ways that I can think of:
1) Table calcs, in much the way that you describe. Non-trivial (as is always the way with table calcs) and won't scale to really large data sets. You could probably do a few tens of thousands of rows.
2) Process the logs before loading into a database. Downside of that approach is that you lock in the value of your timeout parameter - but there's no reason not to add incremental data to your database (even to a data extract) with this approach.
3) Depending on your datasource, you may be able to do this quite efficiently with database window functions. I don't know much about them, though.
I generally use method 2) driven from a perl script. Here's a view of 3,500 sessions from a total of 100,000 requests from a day's log done that way. Colour represents number of requests in the session - you can just see a few red sessions where people were very busy (or there was a javascript bug!).
Hope that helps.
Richard
-
2. Re: Start /Stop events split by inactivity
Piers Chamberlain Apr 3, 2012 12:56 PM (in response to Richard Leeke)Thanks Richard -
I'll avoid any attempt at option 1 because the logs I'm chewing on are typically in the 100K - 10M lines range.
I'm using flat file data sources, so that rules out 3.
Option 2 is the go. I had hoped to avoid any intermediate parsing steps, but you're right, of course, it wont prevent reloading - and I also realise that I have previously written most of the logic for other projects (yay!).
Much obliged
Piers