5 Replies Latest reply on Jun 30, 2018 4:10 AM by Eric Knutson

    How to pass each row of data from Tableau to a Python model to make predictions for each row

    Eric Knutson

      Hi all-

          I'm working on my first TabPy integration and have run into a problem.  I do not think I understand how to properly send each individual data row to my pickled python model.

      I have attached a writeup below, my Tableau workbook (10.4.6),  my pickled model (created with Python 2.7, you need to unzip it), the input dataset, and a small bit of Python code you can use to test the pickled model on the input dataset outside of Tableau.   I was inspired by  Bora Beran and his article https://community.tableau.com/thread/241888

      and thought I would give it a try.

          TabPy is super cool and I seem to be very close to getting it to work.  I would love to give this Tableau report to a user and let them update the IVR system to save $$ on calls. My estimate shows that about 10% of our calls could be switched to IVR and we would still get the same number of orders. The algorithm is returning a cross validation score of 88% with 5 folds so it seems pretty accurate.

       

      I’d love to get this working completely in Tableau so I can show off the wonders of Tableau. Otherwise I’ll just submit a boring spreadsheet with a list of patients/calls that could be switched to IVR.

      Thanks for all your help.

      Cheers,

      Eric

       

       

       

      Dataset:

      The dataset has a list of communication records.  Each record represents a communication to a patient.  The sample data communications all have a MODALITYCODE of “5” which represents an in person call made by a customer service representative.

      COMMSID is a unique row identifier to represent that particular communication record.

       

      Machine Learning Model:

      I have trained and created a machine learning model (a DecisionTree) in Python.  I saved the model to a file (a pickle) .

      It is accessible via TabPy and the calculated field (“PredictionOnModality”)

       

       

      Tableau Goal:

      In Tableau I want to pass all the individual records to my algorithm and have it predict if an order will be made IF we change the MODALITYCODE=4.  MODALITYCODE 4 is a call from our automated IVR system and is much cheaper than the MODALITYCODE 5 customer service call.

       

      The columns of data that my model uses to make the prediction are:

      Column

      Values

      VALIDPHONE

      1 or 0.  1 means we have a known working phone number for the patient.

      MODALITYCODE

      The type of outbound call made to the patient.  4 for IVR, 5 for customer service/human call

      NUM_OF_PREV_ORDS

      The number of order the patient has previously placed with us.

      NUM_OF_PREV_COMMS

      The number of communications the patient has previously had with us.

      DAYS_SINCE_LAST_COMMS

      The number of days since the patient was last contacted.

      BIRTHYEARMASKED

      The Birth Year of the patient.

      CONTACTTEAM

      We have two teams that can place calls.  1 is for team 1 and 2 is for team 2.

      PATIENT_COMMS_HOUR

      Is the hour we made the call at using military time: 0 for midnight up to 23 for 11:00 PM.

       

       

       

      Setup in Tableau:

      I pass all the above columns from the dataset to my algorithm except for the MODALITYCODE. Instead of passing the MODALITYCODE from the dataset, I created a parameter called P2_MODALITYCODE and set it to 4.  Using this parameter, my algorithm can predict if the patient would make a purchase if we called them using the IVR system rather than the more expensive in person customer service call.

       

      Here is my calculated Tableau field that passes the data and uses my algorithm.  This is a TabPy integration and the column is called “PredictionOnModality”.  It returns a “1” is the patient will place an order or a 0 if the patient will not place an order.

      SCRIPT_INT(

      "import pickle

      import numpy as np

      pickle_in = open('C:\pp\BTResupply\code\main\simple.27.pickle','rb')

      model = pickle.load(pickle_in)

      X= np.array([[_arg1[1], _arg2[1], _arg3[1], _arg4[1], _arg5[1], _arg6[1], _arg7[1], _arg8[1]]])

      prediction_set = model.predict(X)

      return prediction_set.tolist()",

      attr([Validphone]),[P2_MODALITYCODE],attr([NUM_OF_PREV_ORDS]),attr([NUM_OF_PREV_COMMS]),

      attr([DAYS_SINCE_LAST_COMMS]),attr([Birthyearmasked]),attr([Contactteam]),attr([Patient Comms Hour])

      )

       

       

      My Issue:

      @

      I don’t think I’m passing the data row by row correctly to my Python algorithm.    I’m getting predictions made (the integration is working) but it sort of looks like if any row predicts a “0” for "PredictionOnModality", then  all rows will show a 0.  If any row predicts a “1” then all rows will show a “1”.

      I opened a support case but support told me they do not support TabPy and I can try the community or professional services.

       

      Tableau ScreenShot:

       

        • 1. Re: How to pass each row of data from Tableau to a Python model to make predictions for each row
          Nathan Mannheimer

          Hi Eric,

           

          Unfortunately I'm not able to run your pickle file, but if you able to return some results try setting the table calculation to address every field by right clicking on your PredictionOnModality field when it is placed on the view, select 'Edit Table Calculation' and then select Specific Dimensions and check every box like so:

          This will pass the data to Python as a single set of lists, and should return a list of classifications correctly paired to their source data. Let me know if this works for you.

          • 2. Re: How to pass each row of data from Tableau to a Python model to make predictions for each row
            Eric Knutson

            Hi Nathan-

            Thanks for taking the time to respond.  I'll will explore this option today and see how it goes.  I played with that a bit earlier per a paragraph I found in one of Bora's posts.

            Thanks,

            Eric

            • 3. Re: How to pass each row of data from Tableau to a Python model to make predictions for each row
              Eric Knutson

              Hi Nathan-

              I'm not quite there yet.  I selected all the dimensions and get results back but it doesn't look like it is working correctly.  I added some additional code to write the array that Tableau passes to my model.  It seems to be always picking the second row of the data and sending that (I'm assuming that Tableau calls the model only once).     I also tried one more thing.  From Bora's post, I can see it discusses "Using Every Row of data".  I currently have Aggregation ON so I tried option #2.

              COMMSID is my unique key so I added that to the rows area.  If I then select "Compute using" COMMSID then my calculated field give me an error: "IndexError : list index out of range"

               

              ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

              Using Every Row of Data - Disaggregated Data

              For accurate results for the Python functions, sometimes those Python functions need to be called with every row in the underlying data. There are two solutions to this:

              1. Disaggregate the measures using Analysis->Aggregate Measures->Off. This doesn’t actually cause the measures to stop their aggregations, instead it tells Tableau to return every row in the data without aggregating by the dimensions on the view (which gives the wanted effect). Using this with Python scripts can get the desired results, but can cause problems for views that we want to have Python work on the non-aggregated data and then display the data with some level of aggregation. The second solution deals with this situation:
              2. Add a dimension such as a unique Row ID to the view, and set the Compute Using (addressing) of the Python script to be along that dimension. If we’re doing some sort of aggregation with Python, then we might need to reduce the number of values returned by filtering them out with something like:
                 IF FIRST()==0 THEN SCRIPT_REAL('insert Python script here') END  

              If we need to then perform additional aggregations on that data, we can do so with table calculations with the appropriate Compute Usings that take into account the increased level of detail in the view.

              ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

               

              So, I think things are working but I'm very fuzzy as to how to get Tableau to pass each row.  I haven't been able to find any working examples for TabPy integration at the row level.  They all seem to perform calculations on aggregates.

               

              Thanks,

              Eric

              • 4. Re: How to pass each row of data from Tableau to a Python model to make predictions for each row
                Nathan Mannheimer

                Ah! Your second comment helped me figure out what was going wrong here. TabPy and the model you've built expect data in lists/arrays. In your code, you are filtering each list to the index 1 (ie _arg1[1]), which is the second value in the python indexing system 0,1,2... which is why you are always seeing the second row's values. This code should execute as expected:

                 

                SCRIPT_INT(

                "import pickle

                import numpy as np

                pickle_in = open('C:\pp\BTResupply\code\main\simple.27.pickle','rb')

                model = pickle.load(pickle_in)

                X= np.array([[_arg1, _arg2, _arg3, _arg4, _arg5, _arg6, _arg7, _arg8]])

                prediction_set = model.predict(X)

                return prediction_set.tolist()",

                attr([Validphone]),[P2_MODALITYCODE],attr([NUM_OF_PREV_ORDS]),attr([NUM_OF_PREV_COMMS]),

                attr([DAYS_SINCE_LAST_COMMS]),attr([Birthyearmasked]),attr([Contactteam]),attr([Patient Comms Hour])

                )

                 

                To answer your more general question, TabPy does not currently have a way to pass truly row level data. Since the SCRIPT_* functions in Tableau to pass code are table calculations, they will operate on what is in the visual level of detail. In this case, you are able to make the visual level of detail the row level by adding COMMSID to the view. Would being able to pass row level data without having it on the view be useful to you?

                • 5. Re: How to pass each row of data from Tableau to a Python model to make predictions for each row
                  Eric Knutson

                  Hi Nathan-

                     Thanks for your help-  you made my Friday.  I was able to get it to work.  I have a few things I want to try out to improve my understanding but your note and frankly your participation was enough to keep me going.

                  The interaction is great and a lot more fun than a spreadsheet.  You can change the modalitycode parameter now  (which type of call to make to a patient) and the model will let you know if the call will result in a sale (PredictionOnModality).

                   

                  Capture.JPG

                  I ended up doing a “Compute Using” Commsid.  I also changed the way I was reading arguments in and the way I returned the modality prediction.  Here is my calculation below.

                  I really appreciate your help.  I will update the community note in the next couple of working days to help others.

                  If I get crazy maybe I will try a youtube video for this TabPy integration as I found it hard to solve.

                  This could be useful for people using R as well since it would have the same setup challenge.

                  I hope you have a good weekend!

                  Cheers,

                    Eric

                   

                  PythonCalculated Field:

                  SCRIPT_INT(

                  "import pickle

                  import numpy as np

                  import pandas as pd

                  pickle_in = open('C:\pp\BTResupply\code\main\simple.27.pickle','rb')

                  model = pickle.load(pickle_in)

                  X_raw= np.array([ _arg1[0], _arg2[0], _arg3[0], _arg4[0], _arg5[0], _arg6[0], _arg7[0], _arg8[0], _arg9[0] ])

                   

                  #Delete the COMMSID column because we don't need it for the model.

                  X=np.delete(X_raw, [0])

                   

                  #Now predict

                  prediction_set = model.predict(X.reshape(1,-1))

                  return prediction_set.tolist()[0]",

                   

                  attr([Commsid]), attr([Validphone]),[P2_MODALITYCODE],attr([Num Of Prev Ords]),attr([Num Of Prev Comms]),

                  attr([Days Since Last Comms]),attr([Birthyearmasked]),attr([Contactteam]),attr([Patient Comms Hour])

                  )