5 Replies Latest reply on Dec 23, 2013 8:25 AM by Jonathan Drummey

R integration: Basic calculations

I'm having trouble getting the expected results of basic calculations in R, (such as a geometric mean, which is not currently available in Tableau.)

For example, this data set contains the length of stay (LOS) for two groups of hospital patients.

 ID Group LOS 1 A 1 2 A 2 3 A 3 4 A 10 5 B 1 6 B 2 7 B 3 8 B 4

and the geometric mean values from R:

Group LOS.mean       LOS.geometric

1 A       4.00                2.78

2 B       2.50                2.21

Actual results in Tableau

Here's the calculated field in Tableau that does not return the desired results.

SCRIPT_REAL('

require(psych)

geometric.mean(.arg1)

',

SUM([LOS]) // Tableau indicates that an aggregate function is required here.

)

The 12.65 is actually the geometric mean of the sums of the two groups, but I can't figure out how to send the individual LOS values rather than the aggregated values for ech group. Any suggestions?

• 2. Re: Re: R integration: Basic calculations
but I can't figure out how to send the individual LOS values rather than the aggregated values for ech group.

The individual (not aggregate values) you want to send to R need to be in the view.

Add LOS to the Detail button. It'll be added as an aggregate measure, so click on the green pill > Dimension. Now all of the calculations will be done at the LOS  level of detail.

Unfortunately this includes your AVG(LOS) measure. To fix this you'll need to convert this to a table calc as well (R functions are table calcs). LOS.avg (TC) =

WINDOW_AVG(MIN(LOS)) // The MIN() could also be AVG(), SUM(), MAX(), since there is only one LOS value per row

After adding the measure to the view, click the pill > Compute Using > LOS --- this means the WINDOW will consist of all of the LOS rows and partitioned by all other dimensions in the view (Group, for example).

Now you'll get the WINDOW_AVG() for every row (i.e., every LOS value). In a table you only want the first value, so wrap the above function in an IF FIRST() == 0 THEN ...  END statement.

You can do the same thing for your R function (compute using LOS, add IF FIRST() == 0, ...).

Finally, you'll probably have a lot of white space caused by the NULL values (when FIRST() <> 0). The eliminate this, from the top menu bar, click Analysis > Stack Marks > Off.

Jim

1 of 1 people found this helpful
• 3. Re: Re: R integration: Basic calculations

Jim,

In this sample data set putting LOS on the level of detail works, however I'm pretty sure it will fail in an actual data set because there will be many patients with the same LOS. I'm guessing that ID would need to be on the level of detail instead, with the Compute Using set to ID.

Rollie,

There's no need for R here. I thought the same thing as you (no geometric mean in Tableau) but Mark Jackson came to my rescue with a really simple calc, see the comments on this post: http://community.tableau.com/ideas/2234

Cheers,

Jonathan

1 of 1 people found this helpful
• 4. Re: R integration: Basic calculations

Thank you for the detailed steps, Jim. This was very helpful information.

Jonathan - you were correct that the ID was also required for a real-word data set with repeating values of LOS. I also had to change the table calculations for both measures using the Advanced dialog, as shown below.

• 5. Re: R integration: Basic calculations

Hi Rollie,

I'm glad you go it working for you! Here's an ease of use, performance, and maintenance tip: Keep the number of dimensions in the view to the minimum performance for accurate results. In this case, with ID in the view LOS is not necessary, so it could be completely removed and the Compute Using would be just on ID. If you needed LOS for a tooltip (which doesn't make sense in a vew like this), then it can be aggregated using MIN(), MAX(), or ATTR().

This has several impacts:

- Tableau is sending a simpler query to the data source that could be faster to return data.

- Configuring the addressing and partitioning of table calcs is easier, faster, & more accurate because there are fewer dimensions to manipulate.

- Reviewing the addressing and partitioning of table calcs is faster & more accurate because less text is necessary to describe what's going on.

- You'll run into fewer problems with unexpected/unwanted densification (domain padding & domain completion behaviors).

Jonathan