# Covariance, Trend Lines, Correlation Coefficient R and R-Squared

Version 9

Description:

These calculations provide the basic correlation values in calculated fields. These include

• covariance
• slope of trend line, using on cov(x,y) / var(x)
• Pearson's correlation coefficient, "R", which is cov(x,y) / (stdev(x) *stdev(y)
• R-Squared

Tableau returns these values when you add a trend line and then select Describe Trend Model. While handy for ad-hoc analysis, this functionality has a few drawbacks:

1. You can't display the values in the view using, for example, a tooltip or title.
2. You can't use the values inside another calculated field.
3. When data is segmented by a categorical variable, Tableau treats this dimension as part of the trend model. In some case you may want to calculate the values independently for each segment.

Showing Trend Line and R-Squared in Title Calculating Trend Line and R-Squared for Each Department There are several different ways we can do these calculations in Tableau:

* Using COVAR() and CORR() - These regular aggregate functions were introduced in version 10.2, they may not be available for all data sources. COVAR() and CORR() can also be embedded in Level of Detail (LOD) expressions.

* Using WINDOW_COVAR() and WINDOW_CORR() - These table calculations were introduced in 10.2 and are available in all data sources.

* Using table calculations such as WINDOW_AVG() and WINDOW_STDEV() as primitives to create a formula that computes the results. This is necessary for versions prior to Tableau v10.2.

* Using Tableau's R integration.

* Using Tableau's Python integration - This was first available in version 10.1 and is not covered here.

Example Calculation:

Covariance = cov(x, y): Covariance as a regular aggregate (requires Tableau v10.2 or higher):

//cov(profit, sales) as a regular aggregate

COVAR([Profit],[Sales])

Covariance as a table calculation (requires Tableau v10.2 or higher):

//cov(profit, sales); set addressing / compute using to correct dimension

WINDOW_COVAR(SUM([Profit]),SUM([Sales]))

Covariance built from table calculation primitives:

//cov(profit, sales); set addressing / compute using to correct dimension

(1/(SIZE()-1))*

WINDOW_SUM(

(SUM([Profit]) - WINDOW_AVG(SUM([Profit]))) *

(SUM([Sales])  - WINDOW_AVG(SUM([Sales])))

)

Trend Line Slope = cov(x,y) / var(x) Slope as a regular aggregate:

//slope of trend line = cov(x,y) / var(x)

COVAR([Profit]), [Sales]) / VAR([Profit])

Slope as a table calculation requires Tableau v10.2 or higher):

//slope of trend line = cov(x,y) / var(x)

WINDOW_COVAR(SUM([Profit]), SUM([Sales])) / WINDOW_VAR(SUM([Profit]))

Slope built from table calculation primitives:

//slope of trend line = cov(x,y) / var(x)

[cov(profit, sales)] / WINDOW_VAR(SUM([Profit]))

Trend Line Intercept = mean(y) - slope(x, y) * mean(x) Intercept as a regular aggregate:

//intercept of trend line = mean(y) - slope*mean(x)

AVG(SUM([Sales]) - COVAR([Profit]), [Sales]) / VAR([Profit]) * AVG([Profit])

Intercept as a table calculation:

//intercept of trend line = mean(y) - slope*mean(x)

WINDOW_AVG(SUM([Sales])) - WINDOW_COVAR(SUM([Profit]), SUM([Sales])) / WINDOW_VAR(SUM([Profit])) * WINDOW_AVG(SUM([Profit]))

Intercept built from table calculation primitives:

//intercept of trend line = mean(y) - slope*mean(x)

WINDOW_AVG(SUM([Sales])) - [trend slope (profit, sales)] * WINDOW_AVG(SUM([Profit]))

Pearson's correlation coefficient, "R" = cov(x, y) / (stdev(x)*stdev(y)) R as regular aggregate:

CORR([Profit], [Sales])

R as a table calculation:

WINDOW_CORR(SUM([Profit]),SUM([Sales])

R from table calculation primitives:

//R

[cov(profit, sales)] /

(WINDOW_STDEV(SUM([Profit]))*WINDOW_STDEV(SUM([Sales])))

//R-Squared

[R]^2

Set the addressing (compute using) to the dimensions that are driving the visible detail. In the example above, after adding the calculation to the view, right-click the pill and set Compute Using to Category.

Related Functions:

If you're running R, you can use the following formula to call R's cov and cor functions from Tableau:

SCRIPT_REAL("cov(.arg1, .arg2)", SUM([Sales]), SUM([Profit]))

SCRIPT_REAL("cor(.arg1, .arg2)", SUM([Sales]), SUM([Profit]))

Using Dates and DateTimes for x/y Values:

If you're using dates or datetimes as the x or y (for example computing the slope of a trend over time) and a number as the other value then you'll need to convert the date field to a usable number. There are three parts to this:

a) If the date is a dimension and the other field is a measure then you'll need to aggregate the date to also use it as a measure, for example using the MIN() or ATTR() aggregation.

b) Make sure the date level and date part/date value (e.g. year, month & year, year/month/day/hour/minute, etc.) used in the calculation is the same as what is used in the view. Here's a link to an example from the comments below: Apr 29, 2017 12:23 PM.

c) Convert the date or datetime into a number, use INT() for dates and FLOAT() for datetimes.

For example the covariance calculation for x = a date dimension using a minute date value (so truncated to the minute) and y = myValue could be something like:

WINDOW_COVAR(FLOAT(DATETRUNC('minute',MIN([myDate]))), SUM([myValue]))

## A Note on Using Table Calculations:

The WINDOW_ functions referenced here are all table calculations, like WINDOW_COVAR(SUM([Profit]), SUM([Sales])). In general these calculations will have their compute using set to the dimension(s) that you have on Detail. There could be four possible inputs for the X and/or Y arguments to this table calculation and if you haven't used table calculations before this requires a little explanation as to how to set up the arguments and your visualization to get accurate results.

• A record level value in the data that is aggregated to the viz level of detail (vizLOD), like SUM([Profit]) where the viz Level of Detail is set by dimension(s) on Detail.
• A record level value in the data that should not be aggregated to the vizLOD (or, in other words, the vizLOD should be record level). Since table calculations like WINDOW_COVAR() require aggregate results as arguments we have to do an aggregation. The usual method is to add some sort of row id dimension to detail and then you can use SUM([X]) as the measure knowing that for this view each X is really just getting summed to the record level, i.e. X = SUM([X]).
• An aggregate calculated measure, for example a "Discount Amount" measure with the formula SUM([Sales]) * AVG([Discount]). In this case the measure is already an aggregate so the argument for the WINDOW_COVAR() would be [Discount Amount] with no aggregate keyword.
• A table calculation measure, for example an Index calculation with the formula INDEX(). In this case the result is also already an aggregate so the argument for the WINDOW_COVAR() would be [Index]. Also note with nesting table calculations inside other table calculations you have to pay attention to the compute using of each nested table calculation. For more information please refer to the Tableau documentation.