5 Replies Latest reply on Jun 30, 2018 4:10 AM by Eric Knutson

    How to pass each row of data from Tableau to a Python model to make predictions for each row

    Eric Knutson

      Hi all-

          I'm working on my first TabPy integration and have run into a problem.  I do not think I understand how to properly send each individual data row to my pickled python model.

      I have attached a writeup below, my Tableau workbook (10.4.6),  my pickled model (created with Python 2.7, you need to unzip it), the input dataset, and a small bit of Python code you can use to test the pickled model on the input dataset outside of Tableau.   I was inspired by  Bora Beran and his article https://community.tableau.com/thread/241888

      and thought I would give it a try.

          TabPy is super cool and I seem to be very close to getting it to work.  I would love to give this Tableau report to a user and let them update the IVR system to save $$ on calls. My estimate shows that about 10% of our calls could be switched to IVR and we would still get the same number of orders. The algorithm is returning a cross validation score of 88% with 5 folds so it seems pretty accurate.

       

      I’d love to get this working completely in Tableau so I can show off the wonders of Tableau. Otherwise I’ll just submit a boring spreadsheet with a list of patients/calls that could be switched to IVR.

      Thanks for all your help.

      Cheers,

      Eric

       

       

       

      Dataset:

      The dataset has a list of communication records.  Each record represents a communication to a patient.  The sample data communications all have a MODALITYCODE of “5” which represents an in person call made by a customer service representative.

      COMMSID is a unique row identifier to represent that particular communication record.

       

      Machine Learning Model:

      I have trained and created a machine learning model (a DecisionTree) in Python.  I saved the model to a file (a pickle) .

      It is accessible via TabPy and the calculated field (“PredictionOnModality”)

       

       

      Tableau Goal:

      In Tableau I want to pass all the individual records to my algorithm and have it predict if an order will be made IF we change the MODALITYCODE=4.  MODALITYCODE 4 is a call from our automated IVR system and is much cheaper than the MODALITYCODE 5 customer service call.

       

      The columns of data that my model uses to make the prediction are:

      Column

      Values

      VALIDPHONE

      1 or 0.  1 means we have a known working phone number for the patient.

      MODALITYCODE

      The type of outbound call made to the patient.  4 for IVR, 5 for customer service/human call

      NUM_OF_PREV_ORDS

      The number of order the patient has previously placed with us.

      NUM_OF_PREV_COMMS

      The number of communications the patient has previously had with us.

      DAYS_SINCE_LAST_COMMS

      The number of days since the patient was last contacted.

      BIRTHYEARMASKED

      The Birth Year of the patient.

      CONTACTTEAM

      We have two teams that can place calls.  1 is for team 1 and 2 is for team 2.

      PATIENT_COMMS_HOUR

      Is the hour we made the call at using military time: 0 for midnight up to 23 for 11:00 PM.

       

       

       

      Setup in Tableau:

      I pass all the above columns from the dataset to my algorithm except for the MODALITYCODE. Instead of passing the MODALITYCODE from the dataset, I created a parameter called P2_MODALITYCODE and set it to 4.  Using this parameter, my algorithm can predict if the patient would make a purchase if we called them using the IVR system rather than the more expensive in person customer service call.

       

      Here is my calculated Tableau field that passes the data and uses my algorithm.  This is a TabPy integration and the column is called “PredictionOnModality”.  It returns a “1” is the patient will place an order or a 0 if the patient will not place an order.

      SCRIPT_INT(

      "import pickle

      import numpy as np

      pickle_in = open('C:\pp\BTResupply\code\main\simple.27.pickle','rb')

      model = pickle.load(pickle_in)

      X= np.array([[_arg1[1], _arg2[1], _arg3[1], _arg4[1], _arg5[1], _arg6[1], _arg7[1], _arg8[1]]])

      prediction_set = model.predict(X)

      return prediction_set.tolist()",

      attr([Validphone]),[P2_MODALITYCODE],attr([NUM_OF_PREV_ORDS]),attr([NUM_OF_PREV_COMMS]),

      attr([DAYS_SINCE_LAST_COMMS]),attr([Birthyearmasked]),attr([Contactteam]),attr([Patient Comms Hour])

      )

       

       

      My Issue:

      @

      I don’t think I’m passing the data row by row correctly to my Python algorithm.    I’m getting predictions made (the integration is working) but it sort of looks like if any row predicts a “0” for "PredictionOnModality", then  all rows will show a 0.  If any row predicts a “1” then all rows will show a “1”.

      I opened a support case but support told me they do not support TabPy and I can try the community or professional services.

       

      Tableau ScreenShot: