1 Reply Latest reply on Jun 27, 2018 10:16 AM by Nathan Mannheimer

    Changing Linear Regression output based on Parameter Value

    Abhishek Agarwal

      Hi All,

       

      I am new to tableau and looking for some help in resolving one issue where I am trying to predict new value based on the parameter. Below is a sample script and attached is a tableau packaged workbook.

       

      I have two fields price and rm which indicate price of house and average rooms respectively and I am trying to build simple app where based on linear regression (after connecting it with R), value should change based on the parameter created on rm field.

       

      SCRIPT_REAL("

      x <- lm(price ~ rm, data.frame(price = .arg1, rm = .arg2);

      y <- predict(x, data.frame(rm = .arg3)

      ",SUM(price), Sum(RM), rmParam)

       

      here rmPAram is parameter based on rm field. Now it is giving me the predicted value but not changing/predicting new value it when I am changing the parameter value

       

      I look at Bora Beran sheet for forecasting where he is using economic indicator to change the value but couldn't take much help because here the value is not time dependent and I am not using forecast package. Probably I may have missed linear or multivariate regression related posts where new prediction solution was giving based on parameters.

        • 1. Re: Changing Linear Regression output based on Parameter Value
          Nathan Mannheimer

          Hi Abishek,

           

          The problem here is that Tableau passes values in aggregate to the external services like R. This means the values sent look like SUM([Price]) etc. In your workbook, there are no fields on the view to break up the aggregation (like a row/sample ID), so the model is being trained on one value and one ouput (essentially a bad result because the model is trying to train on one input and one output).

           

          To pass all rows to R to train the model, I created a Sample ID field that can be used as a dimension. I then set the table calculation to compute addressing all Sample IDs:

          This passes all rows of data to R in a single set of vectors (one for Price and one for RM). The model then returns a result based on the parameter value that is passed to the code (rmParam). However, because a value is passed from each Sample ID, a value is also returned for each Sample ID.

           

          To hide these other values, but still pass all data to the R engine, I used a table calculation filter (Table Calc Filter) to hide all resulting values except for the first one. This table calculation just filters out all but the first sample index. Take a look at the Parameter Regression sheet in the attached workbook! I also converted the filter to a continuous slider.