3 Replies Latest reply on Sep 10, 2017 6:26 AM by Prayson Wilfred Daniel

    Linear Regression with Tabpy : Predict Wine Quality

    TOMOHIRO IWAHASHI

      Hi, Can I ask for advice ?

       

      I want to create linear regression line predicting Alcohol by Density of red wine quality .

       

      Data set is from UCI Machine Learning Repository:

      UCI Machine Learning Repository: Wine Quality Data Set

       

      I want to draw Linear Regression trend line on scatter plot by Tabpy. 

       

      like this:

       

      * I created calculation field like this:

       

       

      SCRIPT_REAL(

      '

      # Read pandas and numpy

       

       

      import pandas as pd

      import numpy as np

       

       

      # Read sklearn.linear_model.LinearRegression

      from sklearn import linear_model

      clf = linear_model.LinearRegression()

       

       

      X = pd.DataFrame(_arg1)

      Y = pd.DataFrame(_arg2)

       

       

      Ymat=Y.as_matrix()

      Xmat=X.as_matrix()

      Xmatr=Xmat.reshape(-1,1)

       

       

      #Fit Linear Regresion Model

      clf.fit(Xmatr, Ymat)

      predict = clf.predict(Xmatr)

       

       

      # Convert to list format

      return predict.tolist()

       

       

      #return 1

       

       

       

       

       

       

      ',

      SUM([Density]), SUM([Alcohol])

      )

       

       

      * And get following  Error  "Cannot interpret json value of type 'array' as scalar value .

       

      When I change "return predict.tolist()" to "return 1 " it succeeds, so At the point  of  "return predict.tolist()" , the error is occurring .

      The lines above is OK.

       

      Does anybody have idea what is wrong ?

       

      I attached workbook and ipython notebook .

       

       

      Regards,

       

      Tomohiro.

        • 1. Re: Linear Regression with Tabpy : Predict Wine Quality
          Prayson Wilfred Daniel

          It is now that I see this question

           

          I will show you three ways to go about it. The first two are just like what you are trying to do with a bit of modification. The second tries to split training and testing data in Tableau. The third is what I will recommend

          and find simple because TabPy comes with best deployment tools that allow you to work and deploy your model from Jupyter Notebook (/or whatever IDE you are using).

           

          1. I can see you want to train and predict in the same dataset (risky business ). To do so you can do this:

           

           

          I modified your script to this:

          // We are looking for 3 clusters based on the 4 measures
          
          
          SCRIPT_REAL(
          '
          import numpy as np
          # sklearn.linear_model.LinearRegression クラスを読み込み
          from sklearn import linear_model
          clf = linear_model.LinearRegression()
          
          
          X = np.transpose(np.array([_arg1]))
          y =np.array(_arg2)
          
          
          clf.fit(X,y)
          
          return clf.predict(X).tolist()
          
          ',
          SUM([Density]), SUM([Alcohol])
          )
          

           

          2. If you want to train on a training data, and then test it on testing data using Tableau. You can use Random Function

           

           

          and train the model on training, and test it on the test. During training, you could save your model with joblib (or pickle or dill)

          e.g.

           

          SCRIPT_REAL(
          '
          import numpy as np
          # sklearn.linear_model.LinearRegression クラスを読み込み
          from sklearn import linear_model
          from sklearn.externals import joblib
          clf = linear_model.LinearRegression()
          
          
          X = np.transpose(np.array([_arg1])) #I used transpose here incase you want to add more predictors
          y =np.array(_arg2)
          
          
          clf.fit(X,y)
          joblib.dumb(clf, open('C:\\Users\\NameX\\lnWineModel.sav'.'wb'))
          
          return clf.predict(X).tolist() ## We could return a better info here ...
          
          ',
          SUM([Density]), SUM([Alcohol])
          )
          

           

          To score the data:

          SCRIPT_REAL(
          "
          import numpy as np
          from sklearn.externals import joblib
          
          X = np.transpose(np.array([_arg1])) #Used transpose incase you want to add more predictors
          clf = joblib.load(open('C:\\Users\\NameX\\lnWineModel.sav','rb'))
          
          return clf.predict(X).tolist()
          ", SUM([Density])
          )
          

           

           

           

          3. A better way is to train, validate and deploy your model from Jupyter Notebook with TabPy client. From your Jupyter Notebook

          you only have to create a function and deploy your model:

           

          def lnWineModel(X):
               X = np.transpose([X]).astype(np.float64)
               return clf.predict(X).tolist()
          
          
          import tabpy_client
          
          # Connect to local or server TabPy using client 
          #Here I will connect to local server listening at port 9004
          
          connection = tabpy_client.Client(''http://localhost:9004/")
          
          #Deploy the model
          
          connection.deploy('lnWineModel',
                                             lnWineModel,
                                             'Returns the predicted Alcohol given Density')
          

           

           

          In Tableau, you now only have to call:

           

          SCRIPT_REAL(
          "
          return tabpy.query('lnWineModel', _arg1)['response']
          
          ",SUM([Density]) 
          

           

           

          Let me know if you need more help and I will be glad to guide you.

          1 of 1 people found this helpful
          • 2. Re: Linear Regression with Tabpy : Predict Wine Quality
            TOMOHIRO IWAHASHI

            Hi, Prayson. Thank you for your kind answer.

            I could do this by using reshape_(-1,1) in the same way.

             

            You mentioned cross validation too. It is very useful information . Thank you very much !!

             

            SCRIPT_REAL(

            '

            # Read numpy and pandas

             

             

            import numpy as np

            import pandas as pd

             

             

             

             

            # Read sklearn.linear_model.LinearRegression

            import sklearn

            from sklearn.linear_model import LinearRegression

             

             

            # Read data from Tableau

            X = np.array(_arg1)

            Y = np.array(_arg2)

            X = X.reshape(-1,1)

             

             

            #Fit Linear Regresion Model

            lreg = LinearRegression()

            lreg.fit(X,Y)

            pred = lreg.predict(X)

             

             

            # Convert to list format and return to Tableau

            return pred.tolist()

             

             

             

             

            ',

            SUM([RM]), SUM([Price])

            )

            • 3. Re: Linear Regression with Tabpy : Predict Wine Quality
              Prayson Wilfred Daniel

              Totally true. I would clean unused imports though

               

              SCRIPT_REAL(
              '
              # Import numpy
              import numpy as np
              
              # Read sklearn.linear_model.LinearRegression
              from sklearn.linear_model import LinearRegression
              
              # Read data from Tableau
              X = np.array(_arg1).reshape(-1,1)
              y = np.array(_arg2)
              
              #Initiate and Fit Linear Regresion Model
              lreg = LinearRegression()
              lreg.fit(X,y)
              
              #Score using the model
              pred = lreg.predict(X)
              
              # Convert to list format and return to Tableau
              return pred.tolist()
              ',
              SUM([RM]), SUM([Price])
              )
              

               

              We do not need pandas nor sklearn imported