I'm working on my first TabPy integration and have run into a problem. I do not think I understand how to properly send each individual data row to my pickled python model.
I have attached a writeup below, my Tableau workbook (10.4.6), my pickled model (created with Python 2.7, you need to unzip it), the input dataset, and a small bit of Python code you can use to test the pickled model on the input dataset outside of Tableau. I was inspired by Bora Beran and his article https://community.tableau.com/thread/241888
and thought I would give it a try.
TabPy is super cool and I seem to be very close to getting it to work. I would love to give this Tableau report to a user and let them update the IVR system to save $$ on calls. My estimate shows that about 10% of our calls could be switched to IVR and we would still get the same number of orders. The algorithm is returning a cross validation score of 88% with 5 folds so it seems pretty accurate.
I’d love to get this working completely in Tableau so I can show off the wonders of Tableau. Otherwise I’ll just submit a boring spreadsheet with a list of patients/calls that could be switched to IVR.
Thanks for all your help.
The dataset has a list of communication records. Each record represents a communication to a patient. The sample data communications all have a MODALITYCODE of “5” which represents an in person call made by a customer service representative.
COMMSID is a unique row identifier to represent that particular communication record.
Machine Learning Model:
I have trained and created a machine learning model (a DecisionTree) in Python. I saved the model to a file (a pickle) .
It is accessible via TabPy and the calculated field (“PredictionOnModality”)
In Tableau I want to pass all the individual records to my algorithm and have it predict if an order will be made IF we change the MODALITYCODE=4. MODALITYCODE 4 is a call from our automated IVR system and is much cheaper than the MODALITYCODE 5 customer service call.
The columns of data that my model uses to make the prediction are:
1 or 0. 1 means we have a known working phone number for the patient.
The type of outbound call made to the patient. 4 for IVR, 5 for customer service/human call
The number of order the patient has previously placed with us.
The number of communications the patient has previously had with us.
The number of days since the patient was last contacted.
The Birth Year of the patient.
We have two teams that can place calls. 1 is for team 1 and 2 is for team 2.
Is the hour we made the call at using military time: 0 for midnight up to 23 for 11:00 PM.
Setup in Tableau:
I pass all the above columns from the dataset to my algorithm except for the MODALITYCODE. Instead of passing the MODALITYCODE from the dataset, I created a parameter called P2_MODALITYCODE and set it to 4. Using this parameter, my algorithm can predict if the patient would make a purchase if we called them using the IVR system rather than the more expensive in person customer service call.
Here is my calculated Tableau field that passes the data and uses my algorithm. This is a TabPy integration and the column is called “PredictionOnModality”. It returns a “1” is the patient will place an order or a 0 if the patient will not place an order.
import numpy as np
pickle_in = open('C:\pp\BTResupply\code\main\simple.27.pickle','rb')
model = pickle.load(pickle_in)
X= np.array([[_arg1, _arg2, _arg3, _arg4, _arg5, _arg6, _arg7, _arg8]])
prediction_set = model.predict(X)
attr([DAYS_SINCE_LAST_COMMS]),attr([Birthyearmasked]),attr([Contactteam]),attr([Patient Comms Hour])
I don’t think I’m passing the data row by row correctly to my Python algorithm. I’m getting predictions made (the integration is working) but it sort of looks like if any row predicts a “0” for "PredictionOnModality", then all rows will show a 0. If any row predicts a “1” then all rows will show a “1”.
I opened a support case but support told me they do not support TabPy and I can try the community or professional services.