Covariance, Trend Lines, Correlation Coefficient R and R-Squared

Version 8

    Description:

    These calculations provide the basic correlation values in calculated fields. These include

    • covariance
    • slope of trend line, using on cov(x,y) / var(x)
    • Pearson's correlation coefficient, "R", which is cov(x,y) / (stdev(x) *stdev(y)
    • R-Squared

     

    Tableau returns these values when you add a trend line and then select Describe Trend Model. While handy for ad-hoc analysis, this functionality has a few drawbacks:

    1. You can't display the values in the view using, for example, a tooltip or title.
    2. You can't use the values inside another calculated field.
    3. When data is segmented by a categorical variable, Tableau treats this dimension as part of the trend model. In some case you may want to calculate the values independently for each segment.

     

    Showing Trend Line and R-Squared in Title

    ex1.png

    Calculating Trend Line and R-Squared for Each Department

    ex2.png

     

    There are several different ways we can do these calculations in Tableau:

     

    * Using COVAR() and CORR() - These regular aggregate functions were introduced in version 10.2, they may not be available for all data sources. COVAR() and CORR() can also be embedded in Level of Detail (LOD) expressions.

    * Using WINDOW_COVAR() and WINDOW_CORR() - These table calculations were introduced in 10.2 and are available in all data sources.

    * Using table calculations such as WINDOW_AVG() and WINDOW_STDEV() as primitives to create a formula that computes the results. This is necessary for versions prior to Tableau v10.2.

    * Using Tableau's R integration.

    * Using Tableau's Python integration - This was first available in version 10.1 and is not covered here.

     

     

    Example Calculation:

     

    Covariance = cov(x, y): 

    See Covariance - Wikipedia, the free encyclopedia

    wiki_cov.png

     

    Covariance as a regular aggregate (requires Tableau v10.2 or higher):

    //cov(profit, sales) as a regular aggregate

    COVAR([Profit],[Sales])

     

    Covariance as a table calculation (requires Tableau v10.2 or higher):

    //cov(profit, sales); set addressing / compute using to correct dimension

    WINDOW_COVAR(SUM([Profit]),SUM([Sales]))

     

    Covariance built from table calculation primitives:

    //cov(profit, sales); set addressing / compute using to correct dimension

    (1/(SIZE()-1))* 

      WINDOW_SUM( 

        (SUM([Profit]) - WINDOW_AVG(SUM([Profit]))) * 

        (SUM([Sales])  - WINDOW_AVG(SUM([Sales])))

      )

     

    Trend Line Slope = cov(x,y) / var(x)

    See https://en.wikipedia.org/wiki/Simple_linear_regression

    wiki_slope.png

    Slope as a regular aggregate:

    //slope of trend line = cov(x,y) / var(x)

    COVAR([Profit]), [Sales]) / VAR([Profit])

     

    Slope as a table calculation requires Tableau v10.2 or higher):

    //slope of trend line = cov(x,y) / var(x)

    WINDOW_COVAR(SUM([Profit]), SUM([Sales])) / WINDOW_VAR(SUM([Profit]))

     

    Slope built from table calculation primitives:

    //slope of trend line = cov(x,y) / var(x)

    [cov(profit, sales)] / WINDOW_VAR(SUM([Profit]))

     

    Trend Line Intercept = mean(y) - slope(x, y) * mean(x)

    Same link as above:

    wiki_intercept.png

     

    Intercept as a regular aggregate:

    //intercept of trend line = mean(y) - slope*mean(x)

    AVG(SUM([Sales]) - COVAR([Profit]), [Sales]) / VAR([Profit]) * AVG([Profit])

     

    Intercept as a table calculation:

    //intercept of trend line = mean(y) - slope*mean(x)

    WINDOW_AVG(SUM([Sales])) - WINDOW_COVAR(SUM([Profit]), SUM([Sales])) / WINDOW_VAR(SUM([Profit])) * WINDOW_AVG(SUM([Profit]))

     

    Intercept built from table calculation primitives:

    //intercept of trend line = mean(y) - slope*mean(x)

    WINDOW_AVG(SUM([Sales])) - [trend slope (profit, sales)] * WINDOW_AVG(SUM([Profit]))

     

    Pearson's correlation coefficient, "R" = cov(x, y) / (stdev(x)*stdev(y))

    See https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

    wiki_r.png

     

    R as regular aggregate:

    CORR([Profit], [Sales])

     

    R as a table calculation:

    WINDOW_CORR(SUM([Profit]),SUM([Sales])

     

    R from table calculation primitives:

    //R

    [cov(profit, sales)] /

    (WINDOW_STDEV(SUM([Profit]))*WINDOW_STDEV(SUM([Sales])))

     

    //R-Squared

    [R]^2

     

    Partitioning and Addressing:

    Set the addressing (compute using) to the dimensions that are driving the visible detail. In the example above, after adding the calculation to the view, right-click the pill and set Compute Using to Category.

     

     

    Related Functions:

    If you're running R, you can use the following formula to call R's cov and cor functions from Tableau:

    SCRIPT_REAL("cov(.arg1, .arg2)", SUM([Sales]), SUM([Profit]))

    SCRIPT_REAL("cor(.arg1, .arg2)", SUM([Sales]), SUM([Profit]))

     

    Using Dates and DateTimes for x/y Values:

    If you're using dates or datetimes as the x or y (for example computing the slope of a trend over time) and a number as the other value then you'll need to convert the date field to a usable number. There are three parts to this:

     

    a) If the date is a dimension and the other field is a measure then you'll need to aggregate the date to also use it as a measure, for example using the MIN() or ATTR() aggregation.

    b) Make sure the date level and date part/date value (e.g. year, month & year, year/month/day/hour/minute, etc.) used in the calculation is the same as what is used in the view. Here's a link to an example from the comments below: Apr 29, 2017 12:23 PM.

    c) Convert the date or datetime into a number, use INT() for dates and FLOAT() for datetimes.

     

    For example the covariance calculation for x = a date dimension using a minute date value (so truncated to the minute) and y = myValue could be something like:

     

    WINDOW_COVAR(FLOAT(DATETRUNC('minute',MIN([myDate]))), SUM([myValue]))

     

     

     

    Related Documentation:

     

    Updates:

    • v4: 2015-09-08 -- rewrote calcs using covariance, added trend line calcs. Jim Wahl
    • 2016-09-1 -- added workbook rSquared_v4.twbx for Tableau v9.0.
    • 2017-03-21 -- updated for Tableau v10.2's new Correlation and Covariance functions and added rSquared v5 for 10.2.twbx Jonathan Drummey
    • 2017-04-29 -- added section on using Dates & DateTimes plus cleaned up some formatting.