6 Replies Latest reply on Nov 14, 2012 1:25 PM by Buzz Burhans

regression coefficient

Can the regression coefficient from the linear trend line (of a measure over time) be accessed?  Alternatively, can I add a regression to a calculated field and access the coefficient?  I would like to use the slope of the linear regression as a color field, red if negative, green if positive.

• 2. Re: regression coefficient

You have a number of options with table calculations, but the exact method depends on your exact situation, as there are many factors to consider.

using the formula from https://en.wikipedia.org/wiki/Simple_linear_regression, a non-optimized version would be something like:

(WINDOW_AVG([x]*[y])-(WINDOW_AVG([x])*WINDOW_AVG([y])))

/

(WINDOW_AVG([x]^2)-(WINDOW_AVG([x])^2))

There are other variations as well. Please understand that you have not provided enough information to enable assistance. In most cases just plugging this formula into your workbook with not work. If you can post a packaged workbook with exact details on what what you are looking for, additional assistance could be provided.

• 3. Re: regression coefficient

Thanks for the insight and the offer of additional assistance.

Attached is a packaged workbook with an example.  This is data about dairy farms, on the y axis is milk yield in pounds. The X axis is month.  The data extract will only include data from the current month as the month progresses toward the end, so the last months are often empty.  When the extract is refreshed it always picks up the most recent 4 months.  I suppose I will have to convert the months to a linear integer sequence first to do the regression (i.e. month 1,2,3,4).

The intent is to have the  the person looking at the data quickly identify which farms are having a declining trend in yield and which are increasing by using the regression coefficient on month as a color indicator, so a two step indicator would have negative coefficients indicate a red line, and positive a green line.

An alternative would be to identify the difference between the yield for last month with data and the yield for the month previous to the last month with data.  The difference (a positive or negative amount) could be used as the color indicator.

I received help from Tech support recently that used a parameter for the month, and then used calculated fields for the current month selected in the parameter, and another calculated field for the previous month. That worked fine, and I could create a calculated field with the yield change for those two months.  However, it does not work when I have all the most recent 4 months showing at the same time instead of just a single month selected by the parameter.

Buzz

• 4. Re: regression coefficient

To get the regression coefficient, just replace [x] with INDEX() and [y] with SUM([Milk lb.])

If this evaluates slowly, you can add addition options to ensure that it evaluates quickly, but this slowness fact should be resolved in v8, see http://www.clearlyandsimply.com/clearly_and_simply/2011/01/another-look-at-site-catchment-analysis-with-tableau-6-part-3.html for more details on methods to make WINDOW_ table calcs faster.

Here is an option for looking at the last 2 months:

IF LOOKUP(SUM([Milk lb.]),LAST()) > LOOKUP(SUM([Milk lb.]),LAST()-1) THEN

"Up"

ELSEIF LOOKUP(SUM([Milk lb.]),LAST()) < LOOKUP(SUM([Milk lb.]),LAST()-1) THEN

"Down"

ELSE

"No Change"

END

In order for either route to work properly, you need to filter out the null values prior to table calcs, as done with the filter on SUM([Milk lb.]) with Special -> non-null values. Another route would be additional logic in the formulas to detect and handle nulls, not really desirable.

You will want to forward this to the Tableau support that recommended using a parameter instead of the LAST() function, so they can be aware of this possible route as well. Much better to have the view be data driven than to require a manual selection.

You can also see that I colored these two sheets with different methods, one with the color set to be stepped to 2 and formatted with a custom of +;- and on the other used the IF/ELSEIF/ELSE to return text, you can use either or in combination, or if neither work, there are other options as well.

See the attached workbook for the examples.

• 5. Re: regression coefficient

You are amazing!  I am anxious to try this out, but will have to wait until morning.  Thanks lots!  It makes sense, I can't wait to try it.  I'll let you know in the morning.

Thanks!

Buzz

• 6. Re: regression coefficient

Thanks very much Joe. I am very grateful for your help on that....No way I would have figured it out on my own!  I am also tremendously impressed by your mastery of this...your response was quick, and both options worked correctly. Very very impressive!

I had, before I sent the earlier similar question to Tech support, tried on my own to use a table calculation.  I tried the "First" and "Last" functions, but was not successful. They seemed to evaluate across the entire table rather than on the values within row. In any case, I have used your solution, and made notes so I will recall it in the future.

Very nice solution. Thanks!

Buzz