2 of 2 people found this helpful
The best description I can remember seeing of all this is in that monster thread on table calculations which I started when version 6 first came out. In particular Ross Bunker added comments here explaining "Restarting Every" and "At the Level". Ross was promising a fuller description in an advanced tutorial he was working on. It must be getting quite big by now I'd say. Or maybe it has come out and I've just missed it.
Thanks Richard. I would also love to see that tutorial. So from a quick glance, the added functionality from my 2,3, and 4 questions seems to be around an ability to sort correctly within the partition? If sorting is not important you wouldn't use them?
Does the simplification I made in point 1 stand (ignore)?
2 of 2 people found this helpful
My grasp of table calculations is still more intuitive than explicit, there are parts I still don't understand and others I struggle to explain to myself, much less others. Like Richard, I still look forward to Ross Bunker's advanced tutorial.
Here's what I can add:
1. On the Advanced... compute using, the right side is addressing, left side is partitioning. I've come to think of Compute Using/addressing as "make a calculation for each combination of everything in addressing" and partitioning as "restart the calculation for each combination of everything in partitioning". Where this gets fun is in nested table calculations like this running sum of count distinct post from yesterday: http://community.tableau.com/thread/117771. That's a simple version where there aren't more dimensions in the view and we only have two dimensions, but it's a good example of how calcs will need to have different compute using: The deepest calculation (Min Month) needs to Compute Using along Month, then the next level (Month Sum) needs to Compute Using ID, and finally the top level (Running Sum) needs to compute across both ID and Month.
2. I rememberJoe Mako at one point telling me that the order was important because it determines the default sorting for the table calc. The # of dimensions here also affects the padding as well (which changed behavior between 6.1 and 7.0). Also, whatever is the bottom on the right-hand size of the Advanced... compute using can't be used for Restarting Every. I don't know why that is.
3. "At the level" is the closest that Tableau gets to "ignore this dimension in table calculations". I've only had to use it a few times, and still have a fuzzy grasp on this one. My understanding is that the dimension used in At the level is used for addressing, and all other addressing dimensions are effectively ignored, but they don't become part of partitioning. A problem here is when you want to have "At the level" be a couple of dimensions while ignoring others, but you can only choose one.
4. Normally Tableau will restart partitioning using the combination of the dimensions in the view that are not addressing, setting the Restart every explicitly sets what dimension will be used.
There's a critical piece here about what the data looks like: Particularly how sparse it is and the relationship between the sparseness, how the view is laid out, and Tableau's padding behavior (that depends on where the dimensions are placed in the view and what mark type is chosen). All that makes it tough to create anything using your fruit metaphor, Alex, because the specifics can make major changes in results. I've attached a simple example using the Superstore Sales data. When a crosstab is set up using Container and Category, Tableau shows 20 Abc marks and one blank space where there is not a Container/Category combination. However, an INDEX() function shows 21 marks. When Container and Category are both on Rows, Tableau only shows 20 marks, for both the Abc marks and the INDEX() function.
I think this could be a great conversation for the Think Data Thursday sessions that Tracy Rodgers is organizing. My fantasy would be to come up with some specific problem sets (we could probably find some on the forums) and then have a guided walk through of solutions that also explains the why those solutions work.
Great post Jonathan - as usual with Table Calcs an explanation of the intricacies leads me further down the rabbit hole - your example with index() demonstrates that....
I will continue to read (and reread) both your posts...
1 of 1 people found this helpful
Yes, excellent post Jonathan. Your opening comment: "My grasp of table calculations is still more intuitive than explicit, there are parts I still don't understand and others I struggle to explain to myself, much less others." sums up where I'm at really. I'm reminded of that famous quote about only three people in the world who ever understood General Relativity - in the case of table calcs I can only think of Ross Bunker and Joe make (though Joe also professes incomplete understanding of some of the finer points).
Just a couple of comments on your comments.
In point 2 you say:
"Also, whatever is the bottom on the right-hand size of the Advanced... compute using can't be used for Restarting Every. I don't know why that is."
In Ross Bunker's comments that I pointed to above he says:
" 'Restarting every' moves the given field and those above it in the Advanced... dialog to be partitioning."
So if you set it to be the bottom field, that would make everything partitioning leaving nothing for addressing, so nothing to calculate, so I'm pretty sure that's whey that is.
In point 3 you say:
"My understanding is that the dimension used in At the level is used for addressing, and all other addressing dimensions are effectively ignored, but they don't become part of partitioning."
But Ross says this:
"Similarly 'At the level' moves fields after it in the list to be partitioning, though there is a subtle difference in how that partitioning is done (its not done on value, but rather on postition within the partition, i'll give more details on that in the tutorial)."
So unless anything has changed since 6.0 days I think it's the "At the Level" field and below that become partitioning. But Ross left us dangling with a hint of more complexity to come, there - and I don't ever recall seeing anyone (even Joe!) explain what that comment was about. Or maybe Joe explained and I just didn't understand.
Good idea about Think Data Thursday - as long as Tracy can line up Ross as a special guest star - and ideally allocate about 2 days for the session!
Thanks for the Ross quotes/corrections, and I'm very much reminded of my Relativity class in college where one day I'd think I'd have a grasp of it and the next day I might as well have been reading Martian!
I was working with the Coffee Chain DB and found a really straightforward situation that calls for At the Level. I was trying to create my own row banding calculation using IF INDEX() % 2 = 0 THEN "Even" ELSE "Odd" END with Market and Product Type on the Rows shelf, and it wasn't working because of sparse data (there are no rows for the Tea Product Type in the South Market). Ultimately, setting the Compute Using to Market and Product Type, with sort on Market/Min/Ascending and At the Level worked. I put a walkthrough together in the attached workbook, which is in 6.1 format. I checked and the same behavior exists in version 7.
Ideally, I'd love to bring something like this to one of the Think Data Thursdays where Ross or another table calculation wizard is there and get more perspective/understanding of how/why that works, so I can better predict when At the level would be needed.
restarting every.twbx.zip 409.3 KB
I'm thinking a flow chart could be a really useful addition to my understanding...
Thanks for your patience, Richard. Sometime i'll write that tutorial. I think we do have more documents out there these days, but I still haven't written what I mentioned it in that post (that was a nice stroll down memory lane).
I think that old post actually does a pretty good job of summarizing at the level and restarting, but I'll try to answer Alex's specific questions directly.
- When setting the advanced compute I like to think of the left side as the "header" (i.e. the partition/grouping is created around that header), and the right side as almost an "ignore these" (i.e. just run the calculation ignoring these fields). Is this a fair simplification?
To the first approximation, that's a reasonable simplification. If your table calc doesn't care about order (for example, it only uses TOTAL() or WINDOW_*() over the whole window), then things on the left create the groups (partitions) and things on the right are in the group (i.e. "ignore these"). But as soon as you care about order, (RUNNING_*, LOOKUP, PREVIOUS_VALUE or WINDOW_* with a start/end index), then the order of the things on the right matters.
- What effect does the order of the dimensions on the right hand side therefore make (up, down buttons)?
The order on the right determines the order for sorting. The item at the top determines the primary sort field, and so on down the list. In addition, for At the level and restarting every, it determines the hierarchical structure (is top down, while at the level is bottom up, those statements will hopefully make more sense in the context of the answers for questions 3 and 4). I'll give an example in the questions below, but basically, think of it as most significant to least significant. Let's use the well known hierarchy of dates as the example. Year, Month, Day. If you've got Product on the left, and Year,Quarter, Month on the right, then you'd want Year first, Quarter second, then Month third in the list. That way when you sort, all the things in Year 2012 appear together, then all the things in Q1, etc... If you put Month first, then all the January's would be next to one another. Again, if your calc doesn't care about order, this won't matter, but if it does matter, then you care about these details.
One caveat here is that if you are sorting by a particular measure (i.e. not 'Automatic' sorting at the bottom of the Advanced... dialog), then the rows are sorted using that measure instead of the dimension values. This is like creating a set from the things on the right side and then sorting the set. When sorting by a measure, the order of the fields on the right side doesn't affect the sort (since there is no primary, secondary, etc... anymore, each set member is sorted based solely on the value of the measure), but it does still impact how the items appear in the "At the level" and "restarting every" drop downs.
- In what situation would I use the "At the level" dropdown that couldn't be achieved with the left and right boxes? What does "At the level" even mean?
At the level means 'at what level are the table calculations performed?'. Let's take the date hierarchy from above. Let's say your calculation is SUM(Profit) - LOOKUP(SUM(Profit), -1) (i.e. difference from previous). If you put Y/Q/M on the right, you'd get difference from the previous month (i.e. 2012/Q1/Feb - 2012/Q1/Jan. Now, change the 'At the Level' to quarter, and you'll get 2012/Q2/April - 2012/Q1/Jan. If you instead put Month on the left, you'll be creating partitions based on month. The Jan partition will only have Jan in it, so now you'll get 2012/Q1/Jan - 2011/Q1/Jan. The difference between "At the level" and putting things on the left is that "At the level" treats the items as a hierarchy. It does partition, but it partitions by position rather than by value. In this case, since you said at the level 'Quarter', the 'Previous' value to Q2/April is Q1/Jan (April the first item below Q2, and Jan is the first item below Q1). This can be very hard to think about abstractly, so examples are important. If anyone else has good examples, please post them. The more the merrier. If you are familiar with cubes, this is very similar to Lag/Lead behavior in cube hierarchies.
- Similar question for the "Restarting Every"... though I get what restart means.
This is a little more straightforward than at the level. Here, it is very equivalent to switching the items over to be in the left side, instead of on the right in terms of behavior. However, there is an important difference when it comes to sorting. Recall from above that if i set a measure to sort by, the sorting includes everything on the right. So, if i have only Q/M on the right side and sorting by SUM(Sales), then we'll compute the order of Q/M for SUM(Sales) across all years, then apply that sort to each partition (see side note below on this behavior). Yes, it's confusing, but the sort is computed once, then the same ordering is applied to each partition. So, if you want Year to be part of the sort, you need to included it on the right, but then you can partition on it by doing 'Restarting Every...'.
Side note on the sorting behavior: The original idea behind this was that it was like creating a set, then sorting the set and using the set's ordering for the table calculation. While this is all nice in theory, I'm not certain that it is useful and I'm seriously considering whether I should attempt to change it to the more intuitive behavior that it sorts each partition on the given measure. My big worry here is that such a change would break existing calculations. If any users out there have any feedback on this issue, please let me know. Right now, I think most people figure out a way to work around this issue if they notice. My other big fear is that people don't notice it today and are getting behavior they don't expect. The downside is that if someone _does_ want this behavior, it would be hard to do it if I make the change.
Great explanation once again, Ross.
> Sometime i'll write that tutorial.
Good luck. As in - that will a tough document to write, and if you're anything like me it will be tough getting that to the top of your to do list - everything bumps writing documentation down my to do list. ;-)
On the question of changing how the default sorting works: from reading your description and thinking about it briefly I like the idea, I think that would be preferable behaviour - though I can see that backwards compatibility is a challenge. You might have to expose the option of retaining current behaviour - with that being the default for upgraded calculations but the new behaviour being the default for new calculations. But I imagine the idea of yet another user selectable option is seen as pretty undesirable, too.
Great thread - thank you all for your inputs. I was getting close to an a-ha moment with Ross' post - I'm sure I'll be there with time.
Thanks for posting, Ross. Some bits definitely got clearer for me and I'm with Alex in feeling close to an a-ha moment.
With regards to your last comment about sorting, I think Joe Mako might have some examples. He helped me out in a webex about a month ago and showed me some cases where the Advanced... sort could not replicate the sort created by Table (Across) or Table (Down), I tried to duplicate that from my notes and so far I've always been able to come up with the same sort for both.
Ok, Ross, one more question about your post as I reread it for 4th (or is it 5th?) time...in the answer to #2 above, you wrote:
"One caveat here is that if you are sorting by a particular measure (i.e. not 'Automatic' sorting at the bottom of the Advanced... dialog), then the rows are sorted using that measure instead of the dimension values."
In the Advanced... dialog I don't see any "Automatic" option - as soon as I pick fields to go into the right-hand window, the default sort is always by whatever measure is alphabetically first in the Order Along drop-down:
Are you referring to that default behavior as "Automatic", or something else I'm not aware of?
Jonathan, how do I know you're not on version 7....?
Actually it may even have to be the latest version of 7