There's a really easy way to get the visual effect you are after. Simply drop [Number of Records] (which is just a calculated field with a value of 1) onto the size shelf. That will turn each of your marks into a bar with a length of 1 unit on the x-axis (i.e. 1 day). Those 1 day long bars will join together to form apparently continuous bars with gaps on days with no data, which visually conveys what you want.
If you actually need to calculate the length of continuous periods with no gaps, you may need to do some tricky custom SQL, a bit like some of the approaches discussed in the Employment Gaps thread - or at least wait till Joe Mako sees the question and comes up with a more creative answer. ;-)
It would be interesting to understand the story you are trying to highlight with this data. Attached is a workbook I made with a modification of Richard's idea, thanks for the call out BTW :)
I added .5 to the pull date, this makes it noon instead of midnight, and made a calculated field of -1, and placed that on the size shelf. This way each mark is centered on the tick mark for the day on the axis. If you left it as midnight, you would need to know what direction to want to fill in 24 hours (a value of 1 on the size shelf when dealing with Gantt bars).
I also made a few calculated fields to calculate if there was a break in the pull dates, and how many days were break days vs non break days.
I like the structure of your data, and if you wanted to roll them up (turning a series of consecutive date rows into one row with a start and end date), my first aproach would be to put the data into a database that supports the windowing function Lag (PostgreSQL, SQL Server, etc.). Let me know if this is something you want, and I can help craft the SQL.
Take a look at it and let me know if you have any questions.
Gantt test_edit.twbx 136.9 KB
Richard and Joemako, than you both for your help! I think the view I like the most is the one where the bars start on the June 4th tick mark (see "gantt bar" tab in attached) . I believe the custom calcs will become very handy. Really cool of you to have offered those.
There is one calculation that I am after and can't quite come up with. As you can tell I am able to display the total time elapsed for each bar but I would like to display the time elapsed for each serie of continous days. So for example, if a series has three continuous days a blank and 6 continouous days, I would want to display "3" and "6" to the left of their respective bars.
You'll notice that the time elapsed measure is off but that is ok. It is calculated off of Today() which is what the real life exercise will demand.
Gantt Test Forum.twbx 136.9 KB
You should be able to get the length of each continuous bar with some combination of RAWSQL expressions and/or custom SQL - I can't see how to do it with resorting to some form of SQL access.
I'll have a bit of a play with this later and post an example if I can figure it out.
To get the answers you want, you will have to aggregrate your rows from "testid,pulldate" to "testid,set,start_date,end_date" with some custom SQL at least.
What is your data source?
Well I've been having a lot of fun trying to get those bar lengths. It was a good exercise - I often want similar calculations, and I managed to figure out a few useful tricks.
I've attached a workbook with a series of sheets charting the journey. I've included some notes on the title of each sheet. Also take a good look at each of the calculated fields. Note that I turned on mark borders to highlight the fact that there are lots of overlapping bars. The visual effect of most of the examples would be better without the borders.
Most of the sheets rely on calculated fields using RAWSQL expressions to retrieve additional details for each row, by querying other, related rows. Some of these expressions result in very slow sheet refresh times. The final sheet uses a custom SQL connection to transform the shape of the data returned by the data source. This is much quicker.
A few notes.
1) As your sample data is an extract, the table structure is probably more complex than in your original data source, since Tableau transforms the structure for efficiency of access.
2) If you do need to write SQL against an extract, the easiest way to discover the table and column names you need is to generate a simple view including the fields you are interested in and then look in the Tableau logs to find the generated SQL. That can then form the basis of any RAWSQL or custom SQL you need.
3) The fields [Next pulldate] and [Prev pulldate] find the next and previous dates for the same TestID.
4) The field [End of Series] finds the first later date for the same TestID for which there is no row for the subsequent day (i.e. the last date before a gap).
The various length fields are derived from differences between these dates.
Even after calculating all those fields, it can be quite tricky to get the right visual effects in terms of bars and labels, as the various sheets demonstrate. It is also quite slow.
The final sheet gives a much simpler and better outcome, at the expense of some really quite tricky SQL. I've attached that SQL as a text file, because it will be hard to get at in the attached workbook (I had to connect directly to the original data extract with custom SQL and then make another extract so I could package the workbook - which means you probably can't get at the custom SQL).
In reality, it might be better to restructure your underlying data source to reflect the sort of structure Joe suggested in his last post. That is pretty much what I got with custom SQL, but restructuring the data would make it a lot easier.
This is a nice illustration of the sorts of things you can do with RAWSQL and custom SQL - and also of the downsides in terms of potential for slow performance - and the need for reasonable SQL knowledge. Getting that custom SQL right made my brain hurt.
Wow Richard! Yes that will make anyone's brain hurt. I hope you had some ibuprofen handy! I am very comfortable with transact SQL and I will indeed take your suggestion to set up the data at the source in the interest of time, efficiency and to prevent further brain dammages :-)
Obviously the SQL code from Tableau is a hybrid form specific to the application. There is some weird syntax at times and the need for all those quotes. Is it how you have to write it or is this how Tableau translates transact SQL?
I have posted a different question which involves this worksheet. That one I am sure is going to be a breeze to solve and simply reflects my rookie status with Tableau. I posted it under the "Calculations" theme and the post name is "Cannot convert calculated field to a dimension". I just need to be able to build a distribution table or chart by time elapsed. If I have not burnt all of my credits with you guys and you can show me how to do that, that would be great!
Thank you so much for all your hard work!!
SQL format: Custom SQL and RAWSQL have to be expressed in the dialect of the back-end database - which in this case was Firebird (used for data extracts). That's the other reason for grabbing sample SQL from the Tableau logs - it saves you struggling through that game of "guess the SQL dialect".
Calculated field -> dimension: See other post.
"burnt all my credits": Not at all. Generally when I spend as much time as this on a post it's because it has triggered an idea about how I'd like to present data. I do a lot of work with time-series, and this set me thinking about how best to handle the "missing data in a sequence" question. So I don't regard this as your credits!
I created a project schedule Gantt chart (attached) using data in Excel (also attached) oganized like this:
Client Project Phase Start End Days WorkDays Budget Hours Daily Hours
There is one record for each phase of a project for a client.
Is there a way in Tableau to calculate and display the daily total workload (sum of Daily Hours) using only the start and end dates of each phase?
If not, and if a record must be generated for each date when a project phase is active, what would be an elegant way to do this, rather than manually creating and populating the records in Excel?