The only solution I've found is to pre-process the data to create a summary for each day's activity, and then just use Tableau to plot that. See Task_Dates4.txt to see what the processed data looks lik.
Is there any way to skip the pre-processing step, and just use Tableau on the original data to calculate the queue lengths?
I would like to discuss this with you before I offer my thoughts here. Would you be willing to talk offline @ 513-333-0710?
You can do it with a custom SQL connection, with a UNION query that combines counts of the job starts by day and the job ends by day, and then a running total table calculation. That saves pre-processing the data. I think that will work OK with finer-grained time data (continuous datetimes, or whatever). So you should be able to see queue depth at any point in time.
I've attached a packaged workbook with your sample data - though I had a bit of trouble with your packaged workbook because I'm in New Zealand and so my locale meant it tried to treat dates as DD/MM/YYYY. Curiously it did that for the Job_Started sheet (as the day was never greater than 12) but for the Jobs_Completed sheet it seems to have worked out that there isn't a 13th month so picked up the dates as intended. So I transposed the day and month in your data and created my own - but just in case it mangles it on the way back I've pasted in the SQL and attached a PDF of the results.
Here's the SQL:
SELECT t1.[Start_Date] AS [Event_Date], 'Start' AS [Event_Type], COUNT(*) AS [Delta]
FROM [Task_Dates_NZ#csv] t1
GROUP BY t1.[Start_Date]
SELECT t2.[Completion_Date] AS [Event_Date], 'End' AS [Event_Type], -COUNT(*) AS [Delta]
FROM [Task_Dates_NZ#csv] t2
GROUP BY t2.[Completion_Date]
That issue with mangling dates in packaged workbooks when opening with a different locale sounds like a Tableau bug to me, by the way.
Your custom SQL solves the problem.
It would be nice to be able to do this sort of thing with table calculations, so that viewing queue length grouped by different units of time didn't require editing the custom SQL. I've never seen a good example of a secondary table calculation, but this seems tantalizingly close to the kind of thing they could be used for.
BTW, the dates were mangled again when I opened your workbook in my US locale. You may be right about there being a bug.
Actually, you don't need to edit the custom SQL to change rollup level. With the version I sent through yesterday you can rollup by any time period greater than a day. If you want to go to finer-grained time based rollups (if your raw data has datetimes) you can either change the SQL to aggregate at the lowest level of summarisation you want, or just change it to return all the detail - i.e. two event rows per raw data row. This is the SQL you need to get the detailed data:
SELECT t1.[Start_Date] AS [Event_Date], 'Start' AS [Event_Type], 1 AS [Delta]
FROM [Task_Dates_NZ#csv] t1
SELECT t2.[Completion_Date] AS [Event_Date], 'End' AS [Event_Type], -1 AS [Delta]
FROM [Task_Dates_NZ#csv] t2
I know that this approach works with datetimes down to millisecond granularity becasue I tried it on some of my data immediately after responding tour post yesterday. I do a lot of analysis where queue depth is a useful measure and your question inspired me to think about a smarter way of calculating it - so thanks for that.
BTW - my original SQL which was aggregating with a COUNT(*) in the SQL had a "UNION ALL" where just a "UNION" would have been fine. This isn't generally a problem but in principle could have a performance penalty on large datasets (though in fact UNION ALL may be faster anyway as it eliminates the need for a SORT). But you definitely *do* need "UNION ALL" for the detail version of the query above - just a "UNION" would eliminate duplicates that you need to see for the table calculation to be correct.
I also noticed that I had left the table calculation enabled in the "Activity" sheet in my sample workbook. I had actually intended that to show enqueue and dequeue activity per day, not a cumulative value.
On the locale issue, I've noticed one or two places where Tableau isn't quite as friendly to international customers as the home market - but that's OK, we need to have something to gripe about. At least if you put a currency measure on the colours shelf it doesn't necessarily all come out green. ;-)
But I do think a packaged workbook should be entirely self-contained and work on any PC.
Very elegant solution! Thanks.
At this point, the [Event_Type] field is superfluous, so the SQL can get even a little bit simpler by removing that.
Dredging up an old post here....
I am trying to do something similar to Alex B, except instead of queue length I would like to track a metric (we'll call it "score") over time. My data is in a format similar to Alex's, i.e. one record represents the same job as it progresses through processing, with each stage/score timestamped as it moves through. Specifically, see attached - and pasted below:
ID Create_Date Create_Score Repair_Date Repair_Score Deliver_Date Deliver_Score
1 1/1 10 1/15 15 2/1 25
2 1/10 20 1/20 15 2/1 20
3 2/1 5 2/10 25 2/15 22
4 3/1 15 3/31 25
To solve this issue I connected to my Excel file, then created a Custom SQL query which converts the data into what I'll call a more columnar format (meaning a new record is created each time an ID moves to a new stage, so an ID can appear multiple times since there are multiple stages - as suggested by Richard in this post), then calculated sum(Delta) on Table Across in order to track the score over time - see attached workbook. I want to be able to see the individual scores (should I so choose) over time as well.
My question is: isn't there an easier way to do this? It seems like there should be but I haven't come across a better solution. I am most interested in getting this solved without all the data gymnastics and having to use a Custom SQL query.
Any advice/assistance would be most appreciated.
test score over time.twbx 38.6 KB