1 2 3 Previous Next 36 Replies Latest reply on Oct 29, 2010 10:48 PM by Ross Bunker

    Version 6 table calcs - understanding a few basics

    Richard Leeke

      I've really been struggling to get my head around version 6 table calculations.  The thing in particular which has had me confused is the definition of the windows that the calculations work over and how the various places in the user interface where that can be set interact with each other.


      Unless I've missed something, this isn't an area that has had much coverage in the beta documentation - I'm sure it will be covered well in the final v6 documentation, though.  In the meantime, I thought I'd pass on a little example I put together which helped me understand it better - plus a few of the key "aha" moments that generated.


      I think I've got this all right, but I'm sure one of the Tableau folk will jump in and correct any misconceptions.



      The workbook just uses the standard Coffee Sales data source.  I've put a few dimensions on the Columns and Rows shelves and put a calculated field on the text shelf which just returns the value of the INDEX() function - which displays the index of the current row within the window used by the windowing functions.  The index value is used extensively by the window functions, so you have to be very clear what it is going to be returning in the context in which the calculation is used.


      You can define the properties of the window to be used (which sets the context for the calculation) in three ways:


      a) From the "Table Calc Properties" link when editing the definition of the calculated field.

      b) From the "Compute Using" menu option from the pill menu for the calculated field on the shelf.

      c) From the "Edit Table Calculation..." menu option form the same pill menu.


      Experiment with all the options on all of those menus and watch what happens to the values of INDEX().  At first sight when I did that, some of them seemed to make no sense at all.  But thinking about it carefully, most things now make perfect sense and a lot of things have fallen into place for me.



      I won't try to explain everything I did, but here are a few examples, including a couple of those "aha" moments I had.


      Bear in mind that the datasource contains 2 years' worth of monthly sales data.  For any given Market/State/Product combination there are either 24 rows or zero rows as shown on the second sheet.  (I got myself a bit confused for a while with some of the patterns because I had assumed it would be a bit more sparse than that.)


      In the description I refer to the 3 places you can change the calculation properties from as (a), (b) and (c), referring back to the list above.


      1) When you open the workbook the calculation is in the default state - "Table (Across)".  So the index goes from 1 to 13 on each row.


      2) Using (b), change it to "Table (Down)". All columns now range from 1 to 20.  Makes sense.


      3) Again using (b), change it to "Table (Across then Down)".  Now the grid is sparse, only increasing the index for Market/State/Product combinations that exist.  No Caffe Latte in Colorado, so skip to Caffe Mocha for number 2.  Flick to and fro between the two sheets and you'll see there are 24 records for each combination that gets an index value.  Again, this all makes perfect sense.


      4) Work through all the other Table and Pane across and down options in (b).  All made perfect sense to me immediately.


      5) Try (b) "Cell".  It's exactly the same as "Table (Down then Across)" in this instance, but I suspect it won't always be.  I'm not quite sure what this really means - but let's move on.


      6) Try (b) Product.  Makes sense.


      7) (b) State.  Note that numbering restarts for each Market.


      8) (b) Market.  Huh?  What's going on?  They're all 1...


      If you now choose (c) (Edit Table Calculation...), the description explains what's happening:


      "Results are computed along Market for each Product, State."


      But because Markets are a collection of States, there can only ever be one Market for any State.  So the index has to be 1.




      9) Now put the pill back to the default by selecting (b), "Table (across)".  All rows should now go from 1 to 13.


      10) Now let's see what option a) does.  Edit the table calculation for Index and select (a) ("Table Calc Properties").  Change "Compute using" from "Automatic" to "State" and save the calculation.


      Nothing happened.  Huh?  What's going on?  I just told it to calculate by State.


      Brief pause while I bang my head against the wall again.


      Now drag the Index field off the Text shelf.  Drag it on again.  That's better.  Now it calculates by State.




      So unlike everything else you do when you edit a calculated field, changing it's table calc properties doesn't affect  the behaviour of instances of the field that are already in use on the view - just the defaults used when you first add it to the view.  (I'm sure one of the Tableau folk will correct me if I've got this wrong, but that's what seems to happen.)


      I'm actually a bit embarrassed to admit that it's taken me 2 months to figure this out - but in the interests of saving other people's walls from getting head-shaped indentations like my study I thought I'd pass it on.  I think this needs to be  made clear in the user interface.  I'd vote for text on the window where you set this saying that these are the default properties used when a field is first added to a view and possibly a warning message if you change the properties of a field that is already on a view saying that the properties of existing instances won't be affected.



      Now that we've got that clear, we can start exploring the really interesting stuff.  Using option (c) from the pill on the Text shelf explore all the Advanced options.  I won't try explaining all of those - partly because I still don't fully understand and partly because there are so many permutations.  Here's just one example:



      11) Using (c), select Advanced, pick Product and State in "Compute Using" and set "Order Along" to Type, Minimum.  OK on that and the description says:


      "Results are computed along Product, State (sorted ascending by Min. Type) for each Market."


      Now the indexes jump around all over the place, as you would expect.



      I still get myself into states where the calculated field gets a red exclamation mark and I'm not quite sure why what I've selected is invalid - but this has really helped me to get clearer what is going on.



      Hopefully this will save others from some of the pain I've been having.

        • 1. Re: Version 6 table calcs - understanding a few basics
          Richard Leeke

          "Aha moment" number 3:


          When you have multiple instances of a calculated field on different shelves, the table calculation properties can be different for the different instances.  That can be really confusing when you are trying to define a filter on the  calculated field.  The properties for the Filter shelf instance have to match an instance on another shelf (otherwise the one you drop on Filters just disappears).


          So the process that seems to work for me is this:


          1) Drop the calc field on LoD.


          2) Adjust it's properties if necessary from the pill menu.


          3) Ctrl-drag a copy onto Filters (which guarantees it will have the same properties).


          If you try to set the 2 instances up independently, the one on Filters just disappears whenever you change the properties of the other one.  Attempting to change the properties of the one on Filters just quietly ignores you.


          If you do what I just did and have a copy on LoD and a copy on Rows which are out of step with each other and a copy on Filters in step with the one on Rows, changing the properties of the one on LoD doesn't affect the behaviour of the filter.


          Seems obvious when you put it like that.  Once again, this all makes sense when you realise what's happening.



          Take a look at the Filters sheet I've added to my Coffee Sales example.  I've created a calculated field:


          [Is Last] = LAST() == 0


          I've added that to the sheet four times: on Rows, Columns and twice on Filters.  The one on Rows uses "Table (Down)", the one on columns uses "Table (Across)" and a copy of each of those on Filters.  So selecting True or False from one of the quick filters either just keeps or just suppresses the last row or last column as appropriate.



          All of these "aha moments" seem really obvious now that I've had them.  I suppose that's the nature of "aha moments".



          I wonder if there's a case for being able to alias the calculated field differently for each context?  So in the above it would be nice to be able to have [Is Last Row] and [Is Last Column] without actually having to duplicate the definition of the calculation itself.  Even having Tableau automatically generate an alias each time the properties diverge: [Is Last] and [Is Last (2)], say.


          6.1, maybe.  ;-)

          • 2. Re: Version 6 table calcs - understanding a few basics
            Ross Bunker

            Richard, thanks for posting this.  This is a great way for people to explore how table calculations work.  It can be very hard sometimes to understand what's going on with table calculations since they are often defined in abstract terms.

            One thing I want to mention is that we are planning on producing a tutorial for table calculations to help everyone learn a lot of the things you've been figuring out on your own (hopefully taking some of the pain out of that process, hopefully your forehead will recover...).  The experience you've documented here is very helpful in understanding what we should include in the document and what approaches would be effective in conveying the important lessons.  Thanks for your contribution.

            I want to quickly clarify some terminology, since I may use somewhat different terms.  What you refer to as a window, we typically term a partition.  The fields specified in the 'Compute Using' menu or 'Edit Table Calculation...' dialog are termed the addressing fields.  All of the dimensions on a sheet that are not addressing fields are partitioning fields.  This corresponds to the 'computed along' (addressing) and 'for each' (partitioning) components of the description in the table calculation dialog.  One other nice thing is that in the final v6, the tooltip for a table calculation will actually contain the description also, so you don't have to open the dialog to understand what it is doing.


            Let me confirm a few things.

            5)  Cell behaving just like Table (Across then Down).  This is a bug that will be fixed in the final release.

            8)  Yes, you figured out a subtle rule here.  In this case, you are partitioning on State, and addressing on Market.  Since a single Market has multiple States, the different States will cause the values to be in separate partitions, even though the Market is the same.  Thus, every row ends up with an INDEX() of 1.

            10)  Again, your inference is correct.  When you edit the properties from the data window on the left, you are editing the defaults.  This is unlike editing the formula (which as you note, does propagate to all the uses of the field), but it is exactly like editing default formatting, aggregation, sort and other defaults.  I agree that we should clarify it in the UI.  Not sure if that will make it into 6.0, but we will get it into 6.0.1.


            As for getting yourself into invalid states, there were some issues in the dialog that could lead to this in the betas.  I think we've addressed them all for the final release, so you should never be able to put yourself in an invalid state directly from the 'Edit Table Calculation...' dialog.  The most common way that table calculations become invalid is by removing an addressing field from the sheet.  That is if you set 'Compute Using' to State and then you remove State from your sheet, the calculation will become invalid.


            We first introduced the 'quick' table calculations in version 2.0 of Tableau.  We've gone a long time on just those, but there were so many things we couldn't compute.  With the introduction of table calculations in 6.0, we've made it _possible_ to compute just about anything from the data.  While so many things are possible, some things are not easy or obvious.  We'll be spending some time on making things easier now that things are possible.  In addition, the tutorial I mentioned, along with the Calculation Reference Library that James has begun, will hopefully go a long way toward helping power users like yourself get the most out of the new functions.  And in turn, less experienced users will be able to come to these forums and find out how to do exactly what they need by drawing on the collective expertise available here.


            Let me know if there are further things that are mysterious.  Meanwhile, I'll focus on trying to get that tutorial out as soon as possible!




            • 3. Re: Version 6 table calcs - understanding a few basics
              Ross Bunker

              A couple more notes for Aha 3.


              One thing that is going into the final release is that when you drop an instance of a custom table calculation on your sheet, if there is already an instance of that calculation (either directly or nested in another calc), the new instance will pick up the properties of the existing one by default.  This makes it easy to iteratively build up calculations.  Once dropped you can customize it to be different than the existing one if that's what you want, but often, you really just want to show the calc in multiple places, or filter on it (as in your example).


              It's confusing how table calculations disappear from the filter shelf this is something we'll address in a later release.  For now, my apologies for the annoying behavior.  Ctrl-drag or rt-click and select Filter... to work around it.


              As for the caption problem, the final release will have some help in some cases.  The default captions will include some of the addressing information from the properties if it differs from the default.  You can already go in and customize the caption by right clicking on 'Measure Names' and editing the aliases there (though knowing which is which can be confusing).  There are a bunch of issues in this space and we're trying to figure out the best way to address them.



              • 4. Re: Version 6 table calcs - understanding a few basics
                Richard Leeke

                Thanks for the quick clarifications, Ross.


                I'd heard the terms addressing and partitioning, but hadn't really got them clear in my mind.  Even just reading your definitions has helped a lot.  I'll try to follow your terminology from now on.


                I was sure there was documentation coming, by the way, but I can see that you guys haven't just been sitting around with your feet up, so you're forgiven for the doco not having landed yet as far as I'm concerned.  After all, it is still a beta.  But having spoken to several people who were as confused as I was by this, and having now figured some of it out I thought I'd put this out there.



                I'd never noticed the right-click options on Measure Names before.  I tried that, but most of my calculated fields don't even appear in the list.  For example if you try that on the Filters sheet in that sample I posted [Index] shows up but [Is Last] doesn't.  And it's not just that continuous and/or numeric fields show and discrete or boolean fields don't.  In the workbook I'm actually working on I've got 5 calculated measures, 3 of which are continuous and only 1 of the continuous ones shows in the list - and that's one that isn't on the sheet while one of the ones that doesn't show is on the sheet.


                Are you saying that if you have multiple instances of the same calculation in flight with different contexts they will (should) appear as separate entries in the Measure Names->Field Properties->Aliases list?  That would be good.



                > Let me know if there are further things that are mysterious.


                Well while I have your attention, there is one...  ;-)


                I've just been implementing a percentile calculation.  I'm only evaluating the percentile for the first row in the partition and also filtering to just the first row, so that I display a manageable number of marks from a large number of rows.  It's giving the right answers (i.e. the same as Tableau reference lines show and the same as my perl pre-processor gives).  But for some reason I don't understand, it will only display as discrete marks, even when I set the mark type to line.  See attached screen clips.  Any clues?  It feels to me as if it may be related to the post-calculated field filter in some way.  Any way round it?

                • 5. Re: Version 6 table calcs - understanding a few basics
                  Richard Leeke

                  One other thing I forget to mention where that Coffee Sales example really helped my understanding is the meaning of the ATTR() function, or setting a pill to "Attribute" rather than "Dimension" or "Measure".


                  Open the Index sheet and make sure it's at the starting position ([Index] set to "Table (Across)").


                  From the drop-down menu on the State pill on the Rows shelf, change State from Dimension to Attribute.  Notice how there are now three sorts of entries in the State column: "Null", "*" and individual state names.


                  The "Null" rows show products for which there are no States in that Market.  The "*" rows show Products for which there is data for multiple States in the Market.  The individual state names are where that State is the only State in that Market for that Product.  I'm not going to attempt any more explanation than that - I'll leave that for Ross's tutorial.

                  • 6. Re: Version 6 table calcs - understanding a few basics
                    Richard Leeke

                    > helpful in understanding what we should include in the document...


                    Well getting the distinction between partitioning and addressing clear is absolutely critical to understanding how this works.  Even though I kind-of got it by experimenting, I think I still had the assumption in the back of my mind that any dimension on a shelf had to be partitioning, so I kept getting confused about what was actually in each partition and why.  That's why I found the Coffee Sales example showing the index values so helpful in that it allowed me to visualise exactly what was happening.



                    > All of the dimensions on a sheet that are not addressing fields are partitioning fields.


                    I think it's also worth bringing out that fields can become addressing fields either explicitly by naming them or implicitly through picking things like "Table (Across)".



                    > Let me know if there are further things that are mysterious


                    I'm still a little perplexed by some of the behaviour of the "Advanced" settings.  It's not entirely clear to me the rationale for which fields are and which aren't available in the "At the level" and "Restarting every" drop-downs.  Sometimes I don't get offered things which I would expect and sometimes I do get offered things which I wouldn't expect.  Since some of those result in red exclamation marks, I suspect that this may all fall under this comment:


                    > there were some issues in the dialog that could lead to this in the betas


                    so maybe all will become clear in the final version (not to mention the tutorial!).



                    One other thing that surprised me slightly was that there's only a single sort field available for "Order Along".  My first thought was that I'd be able to create a set if I wanted a more complex sort, but that doesn't work.  I absolutely don't have any examples where I need this, by the way (and presumably you don't or it would be there).  But I'd be interested in a simple explanation of why that's never needed (or so rarely that it's not worth the complexity).

                    • 7. Re: Version 6 table calcs - understanding a few basics
                      Andy Cotgreave

                      My head hurts.


                      Great effort Richard and Ross. I'm beginning to understand!


                      A few questions on your posts Richard:

                      In the basic Index sheet, if you choose Table(across) or Table(down) it fills every cell. If you choose Pane(down) or Table(across then down) it doesn't - it leaves some cells/panes blank. I don't really understand the rules for when it leaves cells blank or not.


                      If you choose Pane(across then down) things go crazy - lots more rows appear. I really don't know what's happening here.


                      With ref to point 11) in your first post, I think it is useful to rearrange the pills so that the table is laid out in Index order. So, in the example you described, if you rearrange the pills so that Column shelf is empty and Row shelf has: Type, Market, State, Product then you get the Indexes rising sequentially. That can help clarify what the calc. field is doing.


                      What's scary about this is that all of the above posts are hard enough, and they're only using the Index function. We haven't even moved onto LOOKUP, LAST, FIRST, etc yet!



                      Follow me on twitter.

                      Read my Tableau blog at thedatastudio.co.uk

                      • 8. Re: Version 6 table calcs - understanding a few basics
                        Joe Mako

                        I believe setting the partitioning on a custom table calculation is a key setting, and overly complex because of the current dialogs.


                        It would be nice if there was an easy way to translate PostgreSQL/Oracle "Windowing"/"Analyitical" functions to Tableau's custom table calculations.


                        Here is an example of what I expect to see more of on the forum onceversion 6 is released:



                        I skipped on explaining how to do the partitioning because it took me13 clicks to just set the partitioning and sorting used withing thetable calc.


                        Am I missing something? Why is setting the partitioning on a table calc so time consuming? Because it takes so many clicks, it becomes difficult to explain.


                        Attached is the workbook (Tableau 6 Beta 3), and here is how I would have written step by step instructions to set the partitioning and sorting for the table calc:


                        1. Right-click the "Subtotal" pill on the shelf, and select "Edit Table Calculation..."

                        2. Change the drop-down for Compute using to "Advanced..."

                        3. Bring over each field on the Rows shelf, in the same order ("Case #" at the top of the list, and "AU&NAME-C" at the bottom), do not include Measure Names.

                        4. Change Order Along to "TRANSMITDATETIME" and "Minimum", and click the OK button on the Advanced dialog

                        5.Change Restarting every to "Case #", and click the OK button


                        I go through this process for nearly every custom table calc I make.


                        With version 6, I hardly use the "addressing fields" from the "Compute using" drop-down, because they do not make sense to me. The advanced dialog and "Restarting every" drop-down makes sense to me because it allows me to define partitioning in a familiar way, but requires extra clicks.


                        Custom table calculations are extremely flexible, and they are a fantastic addition to Tableau, I just wish there was an easier way to set partitioning.


                        My first thought for a easy to understand way to set partitioning is something like a Level slider. For example, in the attached workbook, under "Format" -> "Borders", you see that I changed the Level slider under Row Divider, I had to go through the above steps to set the partitioning to the same as the row divider level.


                        I do not see this as a problem with an easy solution, and until there is an intuitive and painless interface (like the rest of Tableau), I currently see it as one that will require repeated explanation.


                        In short, Tableau 6 is fantastic, the only blemish I can find is the interface partitioning a custom table calc.

                        • 9. Re: Version 6 table calcs - understanding a few basics
                          Richard Leeke

                          I'll attempt to answer your questions Andy - at the risk of exposing my lack of understanding.  But that's fine, I'm sure Ross will put me right and then we all learn something.


                          > ... it leaves some cells/panes blank ...


                          Yes, I was a bit unsure of the rules there.  My understanding is that the index value runs over all values of the addressing field(s), in whatever order you've specified (which is the same as the sort order in the viz unless you override it).  So Table (Across) is just addressed by Product, numbered 1 to 13 - even though not all 13 Products exist for all partitions.  When you say Table (Across then Down), all three fields become addressing fields and the index runs over all valid permutations - hence the chart becoming sparse.  This is at the level of "kind-of makes sense" on my personal understanding scale.  (Any advance on that Ross?)


                          > If you choose Pane(across then down) things go crazy


                          I hadn't noticed that one.  I agree that looks odd a bit, but looking at the definition of the table calc in "Edit Table Calculation..." gives a clue:


                          "Results are computed along Product, State for each Market."


                          So the addressing fields are Product and State, so the index runs over all extant permutations of Product and State - independent of the partitioning fields (in this case Market).  So the table shows exactly the same index values for all 4 markets, even though each State is only in one Market.


                          Try just adding SUM([Sales]) to the text shelf instead of Index.  Now turn it into a Running Total using Quick Table Calculation.  Now select Pane (Across then Down) from Compute Using on the pill.  You get exactly the same rows and sparse grid.  There are running total values for Oregon in central, even though Oregon isn't in Central.  It's value is always the same as the previous row - in fact Oklahoma, Oregon, Texas, Utah and Washington all have the value of the last product for Ohio (the previous State that is in Central.


                          Sort of makes sense.  (Ross?)


                          > rearrange the pills so that the table is laid out in Index order


                          Yes, that helps.  Then put them back how they were and notice that the index stays the same for each permutation.


                          > What's scary about this is that all of the above posts are hard enough, and they're only using the Index function.

                          > We haven't even moved onto LOOKUP, LAST, FIRST, etc yet!


                          Relax.  Everything else is simple - it's just a set of functions.  The hard thing is understanding addressing and partitions.

                          • 10. Re: Version 6 table calcs - understanding a few basics
                            Ross Bunker

                            Well, never mind what i said about Measure Names.  There's a way to alias it, but unless you are actually using measure names/measure values in your viz, it doesn't work, so it's not worth it.  We'll work on doing a better job with this.


                            I'm able to keep lines when using a table calc filter.  Try creating a coffee chain repro and post it.  I'll take a look and see if it's a bug or not.



                            • 11. Re: Version 6 table calcs - understanding a few basics
                              Ross Bunker

                              As far as the issue of a single sort field, the primary use case for Advanced... was replacing the need to create a group in order to get multi-dimensional ordering.  In that use case, the whole point was to have a single sort field.


                              Since I introduced it, we've come up with some other use cases (fixed sorting that is invariant of sorting used in the viz for example).  The user interface for multi-field sorting could be somewhat cumbersome.  Perhaps the reasonable thing here would be to just say 'viz default' sorting, so that we'd pick up the sorting from the viz like we do today.



                              • 13. Re: Version 6 table calcs - understanding a few basics
                                Ross Bunker

                                Andy and Richard, Re: blank vs. non-blank cells.


                                Table calcs are designed to generate data for blank cells (we call this padding the domain).  Specifically, running totals and 'difference' calcs both treat blank cells as zero.  The logic behind _when_ we create data for blank cells is fairly straightforward in some ways, but as with all things table calcs, it can seem mysterious.  I'll go into more detail in the tutorial, but for now:


                                Basically table calcs will pad the result data to fill in the 'table'.  We often fall back on the notion that these are table calculations, and you are doing things in the context of a table structure (though that table structure is not _required_ to be the same one as displayed in the viz, it often is).  The basic rule is that if you have an addressing field on rows, then we'll create entries and compute table calcs for all of the dimension values that are on columns and vice-versa.  If addressing fields are in both rows and columns, then you'll pad using both.


                                We're really not very smart here, but usually this does what you want if you are building vizs that are compact.  For vizs that are sparse by design, you may get unexpected values showing up.  We'll work on this going forward.


                                I'll get concrete with a quick example.  If you are doing running totals along Product for each Market, you'd really like to fill in the gaps.  If there was some Market that had no sales of Regular Espresso for example, you'd just fill in Regular Espresso as if it had zero sales and so you'd repeat the previous value.  Or if you were doing a difference from previous calculated along Month for each Product, then you'd like to treat the fact that there were no Herbal Tea sales in March as zero.  In each of these examples, you can imagine one of the dimensions on columns, and one on Rows (i should really attach images, but i haven't got the time at the moment, sorry).


                                Let's contrast this with something that make less sense.  Richard, you've previously mentioned the relationship between Market and State.  These form a hierarchy, with Market as the parent.  So, let's imagine a Sum(Sales) for State and Market.  Sounds fine so far.  But what if Market is on Columns and State is on Rows.  What do you expect to happen?  Given that certain Market/State combinations are non-sensical (West,New York anyone?) you get a goofy looking table with a lot of blanks.  Now, if you make the measure Running Sum of Sales, along State for each Market, we 'helpfully' fill in a lot of those blanks, though it makes no sense.


                                The fundamental problem is that we don't really understand the relationships between the different fields deeply (i.e. that Market is a hierarchical parent of State).  Relational databases just don't model that stuff explicitly.  Even for time dimensions where we could do some work, we haven't invested in that yet.  So, we're using the fact that they are on different axes (rows/columns) to infer that they are unrelated, and therefore we should pad them.  Just one more example of how much smarter people are than computers.


                                Now, what we'd really like to do here is get a better understanding of the data and let users tell us when to pad, when not to, etc...  That's all work we're thinking about and trying to understand how it should be built and integrated into the system.



                                • 14. Re: Version 6 table calcs - understanding a few basics
                                  Richard Leeke

                                  Great, that fills in a lot of the gaps Ross - understanding the rationale a bit really helps.  No need for images, the words are clear.


                                  It's good to know that when I've been thinking "wow, Tableau must be doing something really clever, I'm just not smart enough to understand what", I should have just shrugged and said "pah, dumb old computer".  ;-)

                                  1 2 3 Previous Next