2 Replies Latest reply on Feb 13, 2019 12:06 PM by Nathan Phipps

    Creating Box and Whisker Plot with Weighted Measure

    Nathan Phipps

      I am using Tableau 2018.3.

       

      My goal is to conduct pricing analysis on profitability invoices that are stored on an MS SQL server. My data has one record for each order, and customers may order more than 1 unit in each order. I want to show the distribution of Price using a box and whisker plot and have it weighted by Quantity. I was able to figure out how to use quantity as a weight when creating a histogram, but I cannot seem to figure out how to do the same for my box and whisker plots.

       

      For instance, say that Bob submits an order for 10 units at $50, and Alice submits an order for 1 unit at $25. The default box and whisker plot in Tableau would make a plot out of these 2 observations (i.e., 2 records), but the box and whisker plot I want to create would count this as 11 observations (i.e., one observation for each unit).

       

      I have tackled this previously in Python by taking my existing pandas dataframe and creating a second dataframe that has a new record (a new row) for each unit in the quantity field. (See the code below.) Then I created the box and whisker plot using this second dataframe. But I have no idea how to implement this in Tableau.

       

      In the attached packaged workbook, the “Box and Whisker_GOOD” sheet shows what I want my final box and whisker plot to look like. (Note that there are 453 marks for this viz because I manually created a second dataset with 1 record per unit to see what it should look like.)

       

      The “Box and Whisker_BAD (weights needed)” is made using my actual dataset (note that there are only 10 marks), but the records are not weighted by quantity.

       

      How do I get my BAD viz to look like my GOOD viz without processing the data before I bring it into Tableau? Any ideas?

       

       

       

      Python code:

       

      # Create new dataframe that reflects desired weights

      def create_weighted_df(df, desired_field, weight_field):

          weighted_data = []

         

          for index, row in df.iterrows():

              weighted_row = row[desired_field]

              for i in range(1, (int(row[weight_field]) +1)):

                  weighted_data.append(weighted_row)

                 

          weighted_df = pd.DataFrame(weighted_data, columns=[desired_field])

         

          return weighted_df