Geoffrey, which version of the software are you running? Also a packaged workbook with some sample data would be helpful.
See the attached. I used a calculated field with a formula of IF FIRST()==0 THEN WINDOW_SUM([OverBudget Count])/SIZE() END, with a Compute using of the phaseID so it partitions on the PhaseType. The SIZE() gets the # of phaseIDs in the PhaseType, the WINDOW_SUM() counts up the # of phaseIDs that are over budget, and the IF FIRST()==0 returns only one value per PhaseType (the partition).
Then I duplicated the worksheet, got rid of the measures at the PhaseID level, and Ctrl+dragged a copy of he % of Phases that are Over Budget pill onto the Filters Shelf and set it to filter for non-Null values. This cleans up the display and can make Tableau faster to update the data.
In this case, the computation and level of detail are such that turning on accurate Grand totals takes 4 clicks: Analysis->Grand Totals->Show Column Grand Totals, then Analysis->Stack Marks->Off to get rid of the extra padding that Tableau puts into the Grand Total area.
This took me about as long to type up as to build in Tableau.
Shawn, I can understand where the frustrations with working with data at multiple levels of detail can come from, however I think making a definitive statement about the "best solution" is jumping the gun when we don't have all the details on what the data looks like or desired goals are. Here are some factors I can think of:
- Yes, a custom query set up as a materialized view will run faster than the same Custom SQL, but if the data volumes are small (enough to fit in Tableau's query cache) and/or an extract is in use (because the Custom SQL would only be run once for each extract update) then that is unlikely to matter.
- Yes, the interaction between table calculations and grand totals can be very complex and can be very simple, like in this case where it just works. I acknowledge that knowing whether the GTs will "just work" requires a deeper understanding of Tableau than I think should be necessary, but making a blanket statement that they will make the view "go to pot" is going too far.
- We don't know what access the the user has to the underlying data and what skills are available write a custom query of some sort. There are cases where those access and skills aren't available, so we need to be able to apply other solutions, like using the table calcs.
- Also since we don't know data volumes, sometimes asking for large-scale transformation of the data to pre-process it prior to Tableau isn't feasible due to disk space and/or time constraints.
- How much longer (in this case or other situations) would it take to write the SQL code to do the aggregations and group bys in advance, and verify that code, versus a single calculated field in Tableau? And how about when that needs to change in the future, like adding another dimension to the view? With the pre-aggregation situation then every potential aggregation level needs to be computed in advance. I've done that and certainly it can be really fast, and it can be really painful to maintain.
I'm not meaning to shoot down what you suggested, pre-aggregation can definitely be useful and in this case may be necessary to get the desired results; the part I'm objecting to is so quickly saying that pre-aggregation is the "best solution". This is why I answer many forum posts with "Here's one way to do this," or "Here's how I know how to deal with this."
And I have to acknowledge I'm biased and start out on a different side of this argument, where again and again I encounter people who have been told by a Tableau consultant/tech support/developer (sometimes someone working for Tableau, sometimes not) that something is "not possible" in Tableau when it is. When I'm able to share the "yes it can be done in Tableau and here's how" option, and document the advantages and drawbacks of that option, and maybe do some education along the way about the possibillities, then the users have more options, more control over the data and their results, and more skills to use in the future.
If this doesn't convince you to not use "best solution", then I'll get Richard Leeke to jump in with one of his rants about "best practice", because they are really synonymous terms.
Phase-Hours Analysis jtd.twbx 98.2 KB
Yep, just before I hit the 'Add Reply' button I considered deleting everything up to "Jonathan Drummey is..."
In the future I'll follow my instincts.
Jonathan, please consider writing this up as a CRL post, maybe expanding it out to include multiple-flavors. All including the subtotal/grand total solution. I've read the Tableau provided solution for dealing with the one-to-many issue, and none of them seem to solve these problems as well as your table calculation. (If we're going to stay in Tableau ).
Just to emphasize: "Table Calculations That Solve the One-To-Many Relationship Issue"
So I learned something new from this thread. Replying to an email that purports to come from somebody's personal email address can end up as a posting on the forum. I didn't realise you could do that (and as I was reading the notification from my phone I didn't even realise it was an automated notification).
[Irrelevant small-talk to Shawn deleted.]
Thanks! I look forward to checking out this workbook when I get home from the road.
In this case I think that making a database view (this data is on SQL Server) that rolls up the hours wouldn't be too hard to whip up, but I think by working through your workbook will deepen my understanding of how formulas, table calcs, and subtotals all come together, and be a more flexible solution.
Sorry about starting WW3
Checked the workbook and it looks great. Much appreciated!
A question though - Your formula is:
IF FIRST()==0 THEN WINDOW_SUM([OverBudget Count])/SIZE() END
So what happens to rows that don't meet FIRST()==0? Are they just thrown away? So there's actually only one underlying row per partition (Phase Type)?
I've been thinking about doing that, thanks for the encouragement! And I apologize for coming across if my post came across as too harsh, there was a lot going on last week and I think I was venting a bit in the wrong direction.
Based on the dimensions in the view, Tableau issues a query to the data source that returns a certain number of rows. Then, based on the addressing and partitioning of the table calculations in the view, Tableau generates partitions of groups of rows, each row is an address in the partition.
The IF FIRST()==0 part says that if we're in the first row/address (the first phaseID) in the partition (the PhaseType), then return the WINDOW_SUM()/SIZE() computation, and since there's no ELSE statement then all other rows in the partition (the rest of the PhaseIDs) return Null. Then the calculation restarts for each new partition (PhaseType).
That results in a bunch of marks (one for every phaseID/PhaseType combination), most of which are Null. To speed up Tableau, a copy of the calc is on the Filters Shelf set to for only non-Null values, so that shrinks the displayed number of marks back to one for each PhaseType. One key bit to know here is that table calculation filters are applied after the data is retrieved from the data source most all other computations are done, including table calculations.
However, table calculation filters do not apply to grand totals and subtotals, so when using this kind of calculation to reduce the level of detail we have to use Analysis->Stack Marks->Off, for more details on why see Customizing Grand Totals in Tableau v8 – The Stacking Snag | Drawing with Numbers.