Instead of using parameters (which can make it user fallible) how about using window_max/min to get the full range -
window_max(max(datetrunc('day',[End]))) - window_min(min(datetrunc('day',[Beginning]))) + 1
simply set to table accross
I'm doing some testing - 5000 shifts over 6 months broken to 15 minute increments. so the max rows should be around 17280(6 months times 30 days times 96 possible increments). Tab cannot create the extract - 10 minutes of waiting and nothing, unless I screwed up the SQL:
SELECT ['Shift Data for Import$'].[Employee] AS [Employee],
['Shift Data for Import$'].[Labor Level A] AS [Labor Level A],
['Shift Data for Import$'].[Labor Level B] AS [Labor Level B],
['Shift Data for Import$'].[Labor Level C] AS [Labor Level C],
['Shift Data for Import$'].[Shift End] AS [Shift End],
['Shift Data for Import$'].[Shift Start] AS [Shift Start],
['Date Padding 15 mins$'].[Padding Dates] AS [Padding Dates]
FROM ['Shift Data for Import$'],['Date Padding 15 mins$']
WHERE [Shift Start] < [Padding Dates] + 1/(24*4)
AND [Shift End] >= [Padding Dates]
I wonder if this method isn't scalable?
The reason I didn't use dateadd was just that the date functions vary so much between different SQL dialects. But I hadn't struck the issue with excel dates, so if dateadd makes that more robust as well as clearer what is going on that is a good idea.
The window_max approach works as long as there is actually any data at all on the first and last day of the range (for art least one location in this case). But it slows down badly if you have large numbers of rows unless you use some convoluted trickery (for now at least). But you are quite right that the parameter approach is fragile - too easy to get the bounds wrong.
What is your data source?
Evaluating the query results is certainly going to make the database do some work - and I can imagine that it might be more than JET's little brain can manage if you have your data in Excel.
Assuming your typical shifts are 8 hours, that would be 32 * 5,000 = 160,000 rows of extract (which is no problem at all) - but with 5,000 shift records and a total of 17,280 possible buckets to choose from, the database has to choose those 160,000 rows from the possible 86,000,000 permutations.
I can see that taking a while unless you have your data in a proper database, with appropriate indexes on the dates.
If you use PostgreSQL you wouldn't even need to generate the 15 minute buckets as a real table - Postgres has built-in functions for generating sequences which can be really handy for multiplying out your data like this.
Yep, just went out for a walk in the woods and realized my rows would be much larger, and yes it is Excel. I create templates for others to use with data changing every time, coming from all sorts of sources, and needing manipulation before being in the right format, so Excel is by far the best choice. I wonder if it would be quicker to go up against a date file then a time file?
> I wonder if it would be quicker to go up against a date file then a time file?
I'm not sure what you mean.
I thought about the speed of creating that extract from Excel again, and was a bit surprised it didn't work for you, so I mocked up some data in Excel, following the structure of your SQL (5,000 shifts of between 2 and 10 hours, spread over 6 months and 15 minute buckets for reporting). Using exactly your SQL that took 8 minutes to create a 125,000 row, 1.5 MB extract, which works just fine.
I can post my dummy data if you're interested.
Okay, I let it run again - it took a solid 20 minutes and at times looked like it had crashed. I did have a few more dimensions, and it is running on a 2.5 year old laptop...
Per the day/hour thing. My concept (and I don't know if this makes it better) is to have a sheet/table that contains the days of the padding needed, and then another with the times needed. You would first constrain the where against the day (for me no shift would be over 24 hours so I could do something like: where abs([beginning]-[padded date]<2) and then check against the time.
In my mind that drastically brings down the 86,000,000 possible iterations, but then I've already told you about my knowledge of SQL.....
I get you on the time thing now. Yes, using the knowledge of your maximum shift length is a good way to optimise it - that should make a huge difference.
In fact it doesn't even need to be limited to the 24 hour constraint. If you know the maximum duration in your base table, you just need to set up the reporting bucket table as a set of offsets from the start of a record, with enough buckets to span the longest record. It does make the date arithmetic a bit more clumsy, but it should work.
I just had a quick go at that with the dummy data I mocked up to try you shift problem with. The good news is that it built the extract in about 20 or 30 seconds. The not quite so good news is that it didn't get quite the right number of rows and the calculated fields all came out as null. But correctness is just an over-rated detail , the principle looks fine.
I'll have another go at some point and post an example if I get it going.
Awesome. If we could get the extract time down, then there's no reason we couldn't go all in on this - I'm talking quadrupling the rows to use polygons to depict the steps. This removes the issue with trying to either size bars (which can obscure data if they are too big), or using lines and having to accept the misleading diagonals...
@Alex - I was the guy who'd come up with the dateadd-based solution, I think. Here's a link to the post. http://community.tableau.com/message/174518#174518 (I save interesting links in a Word doc, I promise to someday make it public).
@Richard - In the aforementioned workbook I'd created a technique where the data was padded out with an otherwise non-dimensional row for every second (for second-level analysis), and then a row for start event & end event. That might be more efficient in terms of volume in some cases, where 2N+(86400*# of days) might be smaller than 4N. The one issue with this solution is needing to be careful not to filter out the non-dimensional padding rows.
Well I couldn't resist the challenge - I spent way too long in my lunch hour figuring out how to do this.
The attached workbook includes the spreadsheet with my mocked up data, plus two different variants of connections which generate the 15 minute buckets - along with extracts for each of those connections.
The original one takes about 10 minutes to generate on my machine, the second approach takes about 10 seconds. They both have exactly the same numbers of rows and equivalent data - but the faster one is also quite a bit smaller for some reason.
The second approach has resulted in much uglier custom SQL and Tableau calculations - converting dates to and fro to floating point numbers in order to truncate and round at the right level. I kind-of think it might be possible to make it a bit cleaner using date arithmetic - as per Jonathan's comment about dateadd - but I couldn't see a way to do 15 minute buckets like that.
Alex.twbx.zip 1.7 MB
Okay, I'm trying to get my head around this as we speak. From a quick glance, this will still work if a shift goes over the night (but presumably not if the shift lasts longer than 24 hours, which in my use case is fine). Per the ugly calculations, I always end up doing the int/float round method - I don't think there is another way.
Yes, it handles the crossing midnight case just fine.
The way I initially started to look at it after you made the suggestion about narrowing in on the times would only have worked for shifts of under 24 hours - but the way I ended up doing it is only limited by the number of rows in the "Time Bucket" table. That works as an offset from the time bucket that the start time falls into.
I have set the example up with 96 rows, and the queries and calculations assume that a bucket is 15 minutes, so like that it will work for up to 24 hours. But it will work for whatever time period you want - the limit is driven by bucket size and number of buckets.
I've been tinkering with this for a week now. I think the bucket method is the best (at least for my application). I still end up with some issues - for example I had shift times that were rounding differently - some to x.666666, and some to x.666667 (not sure if the number of digits is correct). This is a subsecond difference, so I would end up with two points at 8:30:00 AM for example which would cause a spike down in the charts. I had to round all times to the nearest second.
I believe this is the most comprehensive post on queue/utilization and will think about trying to write this up on my blog.