
1. Re: Replicate VAR functionality through Script_real
diego.medrano Nov 29, 2016 5:17 PM (in response to Danyang Su)Jonathan Drummey may be the one to ask here.

2. Re: Replicate VAR functionality through Script_real
Jonathan Drummey Nov 29, 2016 6:19 PM (in response to Danyang Su)Hi,
I'm the author of the Tableau & R integration document you linked to and yes, that part of the document is shorter than it could have been.
Short answer to the question of "how can I get a smaller result vector": In a view that has every record you want to calculate the variance across a calculation like IF FIRST()=0 THEN SCRIPT_REAL(insert R script to calculate variance here, [your agg measure]) END with that pill as an output *and* on the Filters Shelf filtering for Special>Nonnull values will get you the result you want. You could also get the same results using IF FIRST()=0 THEN WINDOW_VAR([your agg measure]) END.
Long answer to the question of "how can I get a smaller result vector":
I'm going to go highlevel here to explain why this is somewhat complicated, then get back to the IF FIRST()=0 part, then go into some complications. There are four levels of calculation and a fifth grain that we work at in Tableau:
* record level or grain of the data  this is the raw record level data and the results of any rowlevel functions.
* regular aggregate  SUM(), MIN(), AVG(), VAR(), etc. are all regular aggregate measures and are calculated at the viz level of detail (vizLOD) which is based on the dimensions in the view. Level of Detail expressions are a special kind of regular aggregate that returns the results as recordlevel values.
* table calculations  This includes FIRST(), INDEX(), TOTAL(), the WINDOW_*, RANK_*, and SCRIPT_* calcs as well as Python & R scripts. These are mostly aggregations of aggregations (the ones that take fields as arguments require those fields to be aggregates) and are computed along the dimension(s) used for addressing and partitioned based on all the other dimensions in the view. The vector of values sent to Python/R is based on the addressing dimensions.
* postaggregate calculations  This includes reference lines, forecasting, grand totals, the summary card, etc. and are builtin Tableau computations that can generate results we can see but are not usable in other calculations. For example if we want to use a reference line result as a threshold for other calcs then we need to build out our equivalent of a reference line using some other means (table calculations, level of detail expressions, and/or a Tableau data blend).
The fifth "grain" that we work at is what I call the grain of the display. Here's a quick example using a 10 row data set in Tableau v10.0:
In a view where the viz level of detail is this recordlevel then VAR(Value) is going to return Null as expected because it's only aggregating across one record.
If I take the Row dimension out of the view then VAR(Value) is going to return the variance across all the records for each Source:
This works great...now here's where things get more complicated. I'm going to use Tableau's WINDOW_VAR() function as a standin for an equivalent R or Python function. If I try to set up a WINDOW_VAR([Value]) (or a SCRIPT_REAL("R/Python function here",[Value]) Tableau is going to throw up some sort of "All fields must be aggregate or constant when using table calculation functions..." error.
The reason why is that WINDOW_ and SCRIPT_ functions are table calculation functions and the way Tableau is designed they require aggregates as arguments. However, if I just use WINDOW_VAR(SUM([Value])) (or a SCRIPT_REAL("R/Python function here", SUM([Value])) in the view that I used VAR(Value) in then I'm going to get nonsensical resultsm, this is the WVar Sum Value calc below:
This is due to the fact that the inner SUM(Value) is computed at the vizLOD which is just Source, so what's being fed to the WVar Sum Value calc is the two values of 120 and 330, not all the individual recordlevel values. Therefore to have the WINDOW_VAR() (or R/Python calc) return the same results as the regular aggregate VAR() calc we need to *increase* the vizLOD so the table calculation by adding the Row dimension back into the view so the calculation has all the values it needs. In addition we need to change the Compute Using to be on the Row so it partitions on Source. Here's the result, we see accurate variances again:
Now we're getting back to the original question where given that the table calculation (or R/Python script) needs to be fed the vectors of values for each source yet ultimately we want to collapse that down to a single value for the grain of the display (or potentially further calculations). In that case we can use the IF FIRST()==0 THEN WINDOW_VAR(SUM([Value])) END formula to do that. FIRST() is a special table calculation function that identifies the first address in the partition so we can use that to return a single result:
This is great, now usually what people do their first time around is go through a thought process like "Yeay, I got my result, now let me get rid of that Row dimension." And when they remove that from the view they get this with red pills that don't work because they require the Row dimension that is gone:
Or they only have a single partition (I did this by filtering for just Source a) and were able to use the default Table (Down) addressing instead of a fixed addressing and the view looks like this with Null values for the WVar Sum Value calcs because the vizLOD isn't enough for them to be accurate, there's only a single SUM(Value) that is being passed to the WINDOW_VAR() table calculation:
Or (third mistake) they will just rightclick on the Null values from the original and choose Exclude and end up with another broken view because of Tableau's order of operations. When we rightclick+exclude we are excluding dimension values, this removes the needed marks from the viz because dimension filters are applied before table calculations (and R and Python scripts) are computed:
So the table calculation (or R or Python script) *requires* that the Row dimension be in the vizLOD. We can get rid of the extraneous marks by Ctrl+dragging (Cmd+dragging on Mac) one of the table calc pills to the Marks card and filtering for Special tab>nonNull values and then we can get only a single mark for each source:
At this point if we're done we can turn off the Row header or move it to the Detail shelf, then turn off the tooltip for it, and be all set.
However let's say we wanted to show the Avg of all values at the level of Source. Because the vizLOD is Source, Row, AVG(Value) is going to be computed at that (record) level and not at the level of Source, so it's incorrect as in this view:
We have to take into account the vizLOD vs. what we want for the calculation, in this case I dealt with it by using WINDOW_AVG(SUM([Value])) with a Compute Using on the Row so it partitions on each source, and now we have an accurate result:
So we have to keep in mind the level of the calculation we are working with *and* the particular level of detail grain it is being computed at for *every* calculation in the view. What I generally try to do is use record level, regular aggregate, and LOD expressions as much as possible and only resort to table calcs & R scripts (Python just came out) when I absolutely have to because even though they add a tremendous amount of capability they also add quite a bit of complexity.
I've attached the v10.0 workbook I used to build the screenshots above.
Jonathan

reducing size of vector.twbx 112.2 KB
