The only solution I've found is to pre-process the data to create a summary for each day's activity, and then just use Tableau to plot that. See Task_Dates4.txt to see what the processed data looks lik.
Is there any way to skip the pre-processing step, and just use Tableau on the original data to calculate the queue lengths?
I would like to discuss this with you before I offer my thoughts here. Would you be willing to talk offline @ 513-333-0710?
You can do it with a custom SQL connection, with a UNION query that combines counts of the job starts by day and the job ends by day, and then a running total table calculation. That saves pre-processing the data. I think that will work OK with finer-grained time data (continuous datetimes, or whatever). So you should be able to see queue depth at any point in time.
I've attached a packaged workbook with your sample data - though I had a bit of trouble with your packaged workbook because I'm in New Zealand and so my locale meant it tried to treat dates as DD/MM/YYYY. Curiously it did that for the Job_Started sheet (as the day was never greater than 12) but for the Jobs_Completed sheet it seems to have worked out that there isn't a 13th month so picked up the dates as intended. So I transposed the day and month in your data and created my own - but just in case it mangles it on the way back I've pasted in the SQL and attached a PDF of the results.
Here's the SQL:
SELECT t1.[Start_Date] AS [Event_Date], 'Start' AS [Event_Type], COUNT(*) AS [Delta]
FROM [Task_Dates_NZ#csv] t1
GROUP BY t1.[Start_Date]
SELECT t2.[Completion_Date] AS [Event_Date], 'End' AS [Event_Type], -COUNT(*) AS [Delta]
FROM [Task_Dates_NZ#csv] t2
GROUP BY t2.[Completion_Date]
That issue with mangling dates in packaged workbooks when opening with a different locale sounds like a Tableau bug to me, by the way.
Your custom SQL solves the problem.
It would be nice to be able to do this sort of thing with table calculations, so that viewing queue length grouped by different units of time didn't require editing the custom SQL. I've never seen a good example of a secondary table calculation, but this seems tantalizingly close to the kind of thing they could be used for.
BTW, the dates were mangled again when I opened your workbook in my US locale. You may be right about there being a bug.
Actually, you don't need to edit the custom SQL to change rollup level. With the version I sent through yesterday you can rollup by any time period greater than a day. If you want to go to finer-grained time based rollups (if your raw data has datetimes) you can either change the SQL to aggregate at the lowest level of summarisation you want, or just change it to return all the detail - i.e. two event rows per raw data row. This is the SQL you need to get the detailed data:
SELECT t1.[Start_Date] AS [Event_Date], 'Start' AS [Event_Type], 1 AS [Delta]
FROM [Task_Dates_NZ#csv] t1
SELECT t2.[Completion_Date] AS [Event_Date], 'End' AS [Event_Type], -1 AS [Delta]
FROM [Task_Dates_NZ#csv] t2
I know that this approach works with datetimes down to millisecond granularity becasue I tried it on some of my data immediately after responding tour post yesterday. I do a lot of analysis where queue depth is a useful measure and your question inspired me to think about a smarter way of calculating it - so thanks for that.
BTW - my original SQL which was aggregating with a COUNT(*) in the SQL had a "UNION ALL" where just a "UNION" would have been fine. This isn't generally a problem but in principle could have a performance penalty on large datasets (though in fact UNION ALL may be faster anyway as it eliminates the need for a SORT). But you definitely *do* need "UNION ALL" for the detail version of the query above - just a "UNION" would eliminate duplicates that you need to see for the table calculation to be correct.
I also noticed that I had left the table calculation enabled in the "Activity" sheet in my sample workbook. I had actually intended that to show enqueue and dequeue activity per day, not a cumulative value.
On the locale issue, I've noticed one or two places where Tableau isn't quite as friendly to international customers as the home market - but that's OK, we need to have something to gripe about. At least if you put a currency measure on the colours shelf it doesn't necessarily all come out green. ;-)
But I do think a packaged workbook should be entirely self-contained and work on any PC.
Very elegant solution! Thanks.
At this point, the [Event_Type] field is superfluous, so the SQL can get even a little bit simpler by removing that.
Dredging up an old post here....
I am trying to do something similar to Alex B, except instead of queue length I would like to track a metric (we'll call it "score") over time. My data is in a format similar to Alex's, i.e. one record represents the same job as it progresses through processing, with each stage/score timestamped as it moves through. Specifically, see attached - and pasted below:
ID Create_Date Create_Score Repair_Date Repair_Score Deliver_Date Deliver_Score
1 1/1 10 1/15 15 2/1 25
2 1/10 20 1/20 15 2/1 20
3 2/1 5 2/10 25 2/15 22
4 3/1 15 3/31 25
To solve this issue I connected to my Excel file, then created a Custom SQL query which converts the data into what I'll call a more columnar format (meaning a new record is created each time an ID moves to a new stage, so an ID can appear multiple times since there are multiple stages - as suggested by Richard in this post), then calculated sum(Delta) on Table Across in order to track the score over time - see attached workbook. I want to be able to see the individual scores (should I so choose) over time as well.
My question is: isn't there an easier way to do this? It seems like there should be but I haven't come across a better solution. I am most interested in getting this solved without all the data gymnastics and having to use a Custom SQL query.
Any advice/assistance would be most appreciated.
test score over time.twbx 38.6 KB
Stumbled across this old post while I was doing something similar and think I have come up with an easy solution that may work for most use cases. This solution wouldn't have been possible before the addition of the Pivot functionality in Tableau.
1) Connect to your Data Source in Tableau
Assumption here is that the data is structured as below
Queue Person entry exit 1 1 10:00 12:00 1 2 12:00 16:00 1 3 8:00 12:00 1 4 9:00 17:00
2) Use Pivot functionality to reshape the data. In the Data Source view right click on the entry and exit measure fields and select pivot.
This will result the data being transformer to be as follows.
Queue Person Pivot Field Value Pivot Field Names 1 1 10:00 entry 1 2 12:00 entry 1 3 8:00 entry 1 4 9:00 entry 1 1 12:00 exit 1 2 16:00 exit 1 3 12:00 exit 1 4 17:00 exit
3) Open a worksheet - Create a calculated field.
[Pivot Field Names] = 'Entry'
[Pivot Field Names] = 'Exit'
4) Drag Pivot Field Values to Columns - In my case it is timestamp so I formatted it to show hour
5) Drag your calculated field to Rows - right click - default table calculation - running total
You now have a graph showing the number of people in a queue by hour.
I found this a neat way to avoid Custom SQL and users were a lot less scared of this approach
I use this method as well. I transform my data into a running event log with "in" and "out" events this allows the queue tally to be accurate across the dataset. It can be messy if you, for example, want to fix your timeframe in the view. If you do this then make sure to use a LOD expression to control your start/stop times as they may be filtered out by your time reference.
The double nice thing about the data being in this structure is that you can "show missing times" to fill in blank data as well as add "passthrough" events for a second marks graph to make a timeline of things that may happen in the queue but do not change the queue state. For example you could have an additional event of "Ate popcorn" with another timestamp and simply set its in/out value to zero. This allows tableau to set a mark at that time but you do not change the queue length duration.
You can go crazy with the data in that format: an example below is a patient care timeline with marks for specific events, colors for states and bars representing hospital location over time (sorry I cant show the whole thing due to information sensitivity) with ability to show or hide events as needed but maintain the overall flow (just make sure they are still on the details pane).