2 Replies Latest reply on Jul 16, 2018 5:56 PM by Umesh Nair

# Multiple Linear Regression using R and Tableau: "variable lengths differ" issue

Hi Everyone,

I was trying to perform Multiple Linear regression using R and integrating with Tableau. I used the Rossmann Sales dataset with some modifications. I wanted to predict the Sales for a particular store given its other attributes like Competitor distance, promotions active or inactive. So there are continuous as well as categorical variables.

This is the R code i wrote to predict Sales:-

SCRIPT_INT( "

df_training <- data.frame(sales <- .arg1,assort <- .arg2,compdist <- .arg3,promo2 <- .arg4,storetype <- .arg5,dayofweek <- .arg6,promo <- .arg7,scholid <- .arg8,stholid <- .arg9)

fit_sales <- lm(df_training\$sales ~ as.factor(df_training\$assort) + df_training\$compdist + as.factor(df_training\$promo2) + as.factor(df_training\$storetype) + as.factor(df_training\$dayofweek) + as.factor(df_training\$promo) +as.factor(df_training\$scholid) + as.factor(df_training\$stholid))

fit_sales\$fitted.values",

AVG([Sales]), ATTR([Assortment]), AVG([Competition Distance]), ATTR([Promo2]), ATTR([Store Type]), ATTR([Day Of Week]), ATTR([Promo]), ATTR([School Holiday]), ATTR([State Holiday]))

But on applying the above calculation, it throws the following error:-

Error in model.frame.default(formula = df_training\$sales ~ as.factor(df_training\$assort) + : variable lengths differ (found for 'as.factor(df_training\$promo)')

I checked on the internet for a solution, and I found it supposedly happened when there are NULL values in that particular variable. But here, the variable in question, 'promo', doesnt have any NULL values.

Could someone tell me if and where I am going wrong. I have attached the tableau workbook for your reference.

• ###### 1. Re: Multiple Linear Regression using R and Tableau: "variable lengths differ" issue

Hi Umesh,

It looks like the [Promo] column in the datasource would be the problem.

I couldn't find what exactly is problematic with this particular column.

As a workaround you may want to create a calculated field and use it instead:

CASE [Promo]

WHEN 0 THEN 'Inactive'

WHEN 1 THEN 'Active'

END

Yours,

Yuri

• ###### 2. Re: Multiple Linear Regression using R and Tableau: "variable lengths differ" issue

Hi Yuri Fal,

Thank you for the response. I tried the workaround you suggested, and created a calculated field 'PromoStatus' with the above formula, but unfortunately, it gave me the same error when I used the new variable instead of 'Promo' in the sales prediction.

Best Regards,

Umesh