2 Replies Latest reply on Jun 19, 2017 3:26 PM by Bora Beran

    Tableau and R - Multiple Linear Regression Issues

    Justin Joyce

      Hello All,

       

      I have scoured the internet for my issue and have yet to find a solution. I will summarize my problem here and hopefully I have provided enough information!

       

      Attached is a data set from a kaggle competition on advanced house prices. When performing a multiple linear regression within R I am getting a mismatch. I thought it was because 1) I was aggregating (unselected it and not this) 2) Not applying the table calculation correctly (checked 5 ways to sunday no problem) or 3) because of factors which I normalized across reports.

       

      For example, if you download the file, run the regression, the first id returns a value of 12.18 (log value); however, in the tableau workbook, returns 12.2. I thought it might be because of natural log vs log

      I am trying to share this information with my colleagues (specifically the integration and how it can speed up modeling) but I cannot solve even the basic problem here. I have also attached the package workbook as well as R code.

       

      R code:

       

      area_model_streets2 = lm(log(df_training$SalePrice)~log(df_training$LotArea) + log(df_training$GrLivArea)

                               + as.factor(df_training$Neighborhood) + as.factor(df_training$TotRmsAbvGrd)

                               + as.factor(df_training$GarageCars) + as.factor(df_training$HouseStyle) s...)

                               + as.factor(df_training$OverallCond) + as.factor(df_training$OverallQual))

      summary(area_model_streets2)