I was trying to perform Multiple Linear regression using R and integrating with Tableau. I used the Rossmann Sales dataset with some modifications. I wanted to predict the Sales for a particular store given its other attributes like Competitor distance, promotions active or inactive. So there are continuous as well as categorical variables.
This is the R code i wrote to predict Sales:-
df_training <- data.frame(sales <- .arg1,assort <- .arg2,compdist <- .arg3,promo2 <- .arg4,storetype <- .arg5,dayofweek <- .arg6,promo <- .arg7,scholid <- .arg8,stholid <- .arg9)
fit_sales <- lm(df_training$sales ~ as.factor(df_training$assort) + df_training$compdist + as.factor(df_training$promo2) + as.factor(df_training$storetype) + as.factor(df_training$dayofweek) + as.factor(df_training$promo) +as.factor(df_training$scholid) + as.factor(df_training$stholid))
AVG([Sales]), ATTR([Assortment]), AVG([Competition Distance]), ATTR([Promo2]), ATTR([Store Type]), ATTR([Day Of Week]), ATTR([Promo]), ATTR([School Holiday]), ATTR([State Holiday]))
But on applying the above calculation, it throws the following error:-
Error in model.frame.default(formula = df_training$sales ~ as.factor(df_training$assort) + : variable lengths differ (found for 'as.factor(df_training$promo)')
I checked on the internet for a solution, and I found it supposedly happened when there are NULL values in that particular variable. But here, the variable in question, 'promo', doesnt have any NULL values.
Could someone tell me if and where I am going wrong. I have attached the tableau workbook for your reference.
It looks like the [Promo] column in the datasource would be the problem.
I couldn't find what exactly is problematic with this particular column.
As a workaround you may want to create a calculated field and use it instead:
WHEN 0 THEN 'Inactive'
WHEN 1 THEN 'Active'