Table Calculations Broken by Tableau 7.0 Changes
Richard Leeke May 7, 2012 2:09 AMI've just come across a fairly extreme example of an undesirable sideeffect of the changes in the way Table Calculations work in Tableau 7.0.
The change is that Tableau now "pads the domain" (which means filling in dummy rows for all permutations of dimensions) in a lot more circumstances than it did before.
I have just been putting together an example workbook to go with the answer to this question  which asks about finding the latest available data for each product in an inventory table. I thought I'd use the Coffee Sales sample data and demonstrate how to do it with table calculations, data blending and perhaps raw SQL. I was going to say that the table calculation method would work fine on a small dataset like the coffee sales data, but wouldn't scale to really big data sets. However, when I went to refresh the view, it took several minutes to refresh (during which time that telltale signs black bar where the menu should be appeared). Eventually it warned me that Tableau was almost out of memory and might crash at any moment. Sureenough  the process was over 3 GB. Oh, and it got the answer completely wrong.
I'm never quite sure of anything with table calculations, but I was pretty sure I hadn't done anything stupid, so I tried it in 6.1. That worked fine  the view took about a second to refresh and gave the answer I expected.
The view just displays the latest order date and order quantity for each product (or at least it does in 6.1) and has the following fields:
Rows: [Product Name], [Order Date], [Order Quantity]
Level of Detail: [Latest Order] (defined as LAST()==0, Compute Using [Order Date], [Order Quantity], sorted by MIN([Order Date])  ie true for the latest order for each product)
Filters: [Latest Order] = True
In 6.1 this works fine, filtering down from 8,000 odd order rows to show only the latest order for each product  a bit over 1,000 rows.
In 7.0 it appears to pad the domain so that there is a dummy (NULL) row for each permutation of product, date and quantity, which makes the memory blow out excessively. And now the last order date for each product is set to the very latest order date across all products  so all products except the few with orders on that latest date return a blank order quantity.
I modified it slightly (sheet 2) to have SUM([Order Quantity]) instead of [Order Quantity] as a dimension. That reduces the number of permutations and lets it get the wrong answer much more quickly and without taking anywhere near so much memory.
I haven't seen any sign of there being any user selection governing this behaviour, either. Anyone know of a way to intervene?
I've attached the 6.1 and 7.0 versions of the workbook. Opening the 6.1 version in 7.0 also behaves the same as the 7.0 version (i.e. it's not that I put the 7.0 version together differently).

Latest Orders v61.twbx.zip 1.1 MB

Latest Orders v7.twbx.zip 1.2 MB