4 Replies Latest reply on Feb 27, 2020 11:30 PM by Nathan Mannheimer

    Filtering Data and NaN values in column

    Brandon Thomson

      Hey all,

       

      I'm trying to incorporate forecasting (using the fbprophet package) into a Tableau dashboard, but am coming across several errors. To do this, I mostly followed the presentation by Nathan Mannheimer (https://www.youtube.com/watch?v=nRtOMTnBz_Yhttp://)

       

      For context, the dataset I have features aggregated marketing data across several different data sources (6) and clients (28). As you can imagine, each data source has different min/max dates, and there are gaps in data when a client is not marketing on these channels for those dates.

       

      Here is the script I'm running to return forecasted data (measure name = "Forecast"), based on a period to forecast parameter, and a dynamic KPI selector:

       

      [Forecast]

      Screen Shot 2019-09-11 at 3.26.16 PM.png

      Here is the date formula I'm using to populate [Dates] (I copied what Nathan did in his presentation):

       

      [Dates]

      Screen Shot 2019-09-11 at 3.26.48 PM.png

       

      I can only get this to run on account/channel combinations if every row has represented data. In this case, Paid Search across all clients is the only way to yield a result. See the table and chart below:

      Screen Shot 2019-09-11 at 3.18.34 PM.png

      forecast_example.png

      If I select a different channel, or filter by client, I receive the following error:

      Screen Shot 2019-09-11 at 3.10.26 PM.png

       

      And furthermore, if I try to drop rows that feature null values (df.dropna(subset=['ds'])), then I receive this error:

      Screen Shot 2019-09-11 at 3.21.06 PM.png

       

      So - it looks like the script is wanting to pull all data from the dataset, and if I filter the data in Tableau, it is receiving an error because it's keeping the dataframe the same size and filling in Null values. If I try to drop those null values, then the list result differs from what Tableau expects, and it wont accept. Is there anything I can change in my calculations that will account for the level of detail I need? Ideally, I would like to create this where I can filter by any dimension in my dataset, and this script will run.

       

      Unfortunately, I am unable to share a workbook since it contains sensitive information.

       

      Let me know,

      Brandon

        • 1. Re: Filtering Data and NaN values in column
          Nasim Afzali Chali

          Hi Brandon,

          I am facing the same error in Tableau.Did you find any solution for this?

          If so, please let me know what I need to change in my code.

          Thanks,

          Nasim

          • 2. Re: Filtering Data and NaN values in column
            Nathan Mannheimer

            Hi Brandon,

             

            I think the easiest solution here is to replace the missing values with some value like a moving average of the previous values or a 0. The error you are hitting at the end is due to the number of results being lower than that passed, so simply dropping the nulls won't work.

             

            This is something you could do with calculations in Tableau, or with code in Python.

             

            Sincerely,

            Nathan

            • 3. Re: Filtering Data and NaN values in column
              Yanki Okdemir

              Hello Nathan,

               

              I am also having same issues when I use "show missing values", therefore we probably follow the method of adding future dates in backend.

               

              On the other hand i have an another question. In your presentation, you showcased 2 different method of using Python code with Tableau. I was mostly using the second option which is called Deploy option.

               

              But for FBprophet code, first i could not make it to work with that dataframe code of pandas. Since it says _arg1 is not defined if I write df code like

               

              df = pd.Dataframe({'ds': _arg1, 'y' : _arg2})

               

              So I executed this code below and it worked. Therefore i had to ask whether my approach is the right one.

               

              import tabpy_client

              import pandas as pd

              connection = tabpy_client.Client('http://localhost:9004/')

              def tableaufields(_arg1, _arg2, _arg3):

                  from pandas import DataFrame

                  from fbprophet import Prophet

                  from fbprophet.plot import add_changepoints_to_plot

                  from fbprophet.diagnostics import cross_validation

                  from fbprophet.diagnostics import performance_metrics

                  from fbprophet.plot import plot_cross_validation_metric

                  df = pd.DataFrame({'ds': _arg1, 'y': _arg2})  

                  period = _arg3[0]-1 

                  m = Prophet()

                  df = df[:-period]

                  m.fit(df) 

                  future = m.make_future_dataframe(periods=period)  

                  forecast = m.predict(future)

                  return forecast['yhat'].tolist()

              connection.deploy('tableaufields',tableaufields, override=True)

               

              Thank you in advance and your Tableau conference powerpoint is really helpfu, thank you especially for that.

               

              Regards

              Yanki

              • 4. Re: Filtering Data and NaN values in column
                Nathan Mannheimer

                Hi Yanki,

                 

                Can you send me a picture of what your Tableau calculation looks like for your first set of code? And you'll also need to make sure the table calculation is computed to send all the data in one call.

                 

                Sincerely,

                Nathan