3 Replies Latest reply on Sep 14, 2016 6:55 AM by Satya Narayana

    Error in eval(predvars, data, env) : numeric 'envir' arg not of length one

    Satya Narayana

      I am trying to run my R code for Random forest using the function Script_real. Below is my code:

      SCRIPT_REAL("

      library(randomForest)

      library(caret)

      library(caTools)

      library(ggplot2)

      library(ggRandomForests)

      library(randomForestSRC)

      library(corrplot)

      AllMarkets=data.frame(c(as.numeric(.arg1),as.numeric(.arg2),as.numeric(.arg3),as.numeric(.arg4),as.numeric(.arg5),

      as.numeric(.arg6),as.numeric(.arg7),as.numeric(.arg8),as.numeric(.arg9),as.numeric(.arg10),

      as.numeric(.arg11),as.numeric(.arg12),as.numeric(.arg13)))

      Data15 <- subset(AllMarkets,  .arg13 == 2015)

      Data16 <- subset(AllMarkets,  .arg13 == 2016)

      formula<-.arg2~.arg4+.arg14+.arg5+.arg6+.arg7+

      .arg8+.arg9+.arg10+.arg11+.arg12

      Data15.boot <- Data15[sample(x = nrow(Data15), size = nrow(Data15)*3, replace = TRUE),]

      rf <- randomForest(formula, data=Data15.boot, nodesize = 5, mtry = 6,

                         ntree= 1000,importance = TRUE, do.trace = 100,

                         na.action=na.omit)

      yhat <- predict(rf, Data16)

      yhat <- data.frame(yhat)

      yhat

      ",sum(variable1),sum(variable2),SUM(variable3),

      SUM(variable4),SUM(variable5),SUM(variable6),

      SUM(variable7),SUM(variable8),SUM(variable9),SUM(variable10),SUM(variable11),SUM(variable12),AVG(variable13),

      SUM(variabe14))

       

      And I'm facing the following error : ""Error in eval(predvars, data, env) : numeric 'envir' arg not of length one". And this code runs perfectly well in R Studio

       

      Any idea why I'm facing this error?

        • 1. Re: Error in eval(predvars, data, env) : numeric 'envir' arg not of length one
          Patrick A Van Der Hyde

          Hello Narayan,

           

          Is this something still going on?   I see this was a few weeks ago.  I am also linking in Bora Beran to see if he spots something obvious. 

           

          Thanks

           

          Patrick 

          • 2. Re: Error in eval(predvars, data, env) : numeric 'envir' arg not of length one
            Bora Beran

            First thing to try is to change this line

             

            AllMarkets=data.frame(c(as.numeric(.arg1),as.numeric(.arg2),as.numeric(.arg3),as.numeric(.arg4),as.numeric(.arg5),

            as.numeric(.arg6),as.numeric(.arg7),as.numeric(.arg8),as.numeric(.arg9),as.numeric(.arg10),

            as.numeric(.arg11),as.numeric(.arg12),as.numeric(.arg13)))

             

            Tableau already passes arguments as vectors. When you do the above, R will put everything in one column with a long name like c...arg1..arg2..

             

            If you remove the c() that wraps it all, you will get the data frame that has these as individual columns which I think is what you're trying to do here.

             

            AllMarkets=data.frame(as.numeric(.arg1),as.numeric(.arg2),as.numeric(.arg3),as.numeric(.arg4),as.numeric(.arg5),

            as.numeric(.arg6),as.numeric(.arg7),as.numeric(.arg8),as.numeric(.arg9),as.numeric(.arg10),

            as.numeric(.arg11),as.numeric(.arg12),as.numeric(.arg13))

             

            Then when you want to get the results back into Tableau you just need to pull the column containing the predictions as opposed to the full data.frame.

             

            I have an example workbook you can download here that contains decision trees, regression trees and random forests in Tableau with R integration.

             

            Decision trees in Tableau using R « Bora Beran

            • 3. Re: Error in eval(predvars, data, env) : numeric 'envir' arg not of length one
              Satya Narayana

              Thanks Bora Beran, it's working now. But, I'm facing another problem (not an error though ). As you can see in the original post, I'm trying to predict for Data16 using Data15, both of which are subsets of the imported dataset in Tableau.

               

              My data looks like this,

               

              TestVariable 1Variable 2Variable 3Variable 4Variable 5Variable 6Variable 7Variable 8Variable 9YearVariable 11
              ABCDE20150.5784150.0061290.825490.8356070.1890750.5383280.8946970.8953350.56070220150.151732
              ABCDE20160.645910.3210470.7094450.0395210.3217150.7372960.1784090.7310130.28929320160.586291

               

              where I'm predicting the value of one of the variables for each Test value. So, I guess I'll have to calculate yhat along each Test. The problem with this approach is the R script will be returning the values for the Test values coming from Data16 whereas my row labels have data from the master data set. To encounter this error, I ran the predict function for Data15 as well and took the appended result as R script output, like below:

               

              yhat <- predict(rf, Data16)

              yhat1 <- predict(rf, Data15)

              pred<-cbind(Data16,yhat=yhat)

              pred1<-cbind(Data15,yhat=yhat1)

              yhat2<- rbind(pred,pred1)

              yhat2$yhat

               

              As a result, my Tableau output shows predicted values for both Data15 and Data16- which is not ideal. And I can't have year as filter here since it'll filter the data in R script as well. Is there any way to filter only the displayed output for only 2016 but not limit the data used by R script?