1 Reply Latest reply on Oct 31, 2016 12:06 PM by Patrick Van Der Hyde

    Aggregating a new dataset around an existing connection

    Nathan Anderson


      I'm analyzing some data for my company, and I would like to make a regression plot that analyzes an employee's overall performance score as a response variable against  their average training grade. The caveat is that the overall performance score is a qualitative variable, that I would like to scale numerically... and include parameter calculations so that managers and executives can play with the scale as they see fit.


      Unfortunately, I cannot share any data, but the data set looks something like the table below. You will notice for each name, the Overall Training Score Average is the same - I merged this into the data set in R, since it is a static assessment.


      Each entry is a performance review for the employee. What I would like to do is aggregate on name, and regress the training average by the performance score, and insert a parameter that allows the user to toggle the performance score values (say they want a 'low' to be a -1, an average to be a 2 and a higher to be a 3) with a parameter calculation.


      So basically "Name 1" would be one data point to plot: 90 for the training average (rows), and (-1+-1+3)/3 = .333 (columns), which would follow with some sort of trend line.


      NameOverall Training Score AveragePerformance score
      Name 190Low
      Name 190Low
      Name 190High
      Name 2Average
      Name 2Average
      Name 2Average
      Name 392Average
      Name 392High
      Name 392Average
        • 1. Re: Aggregating a new dataset around an existing connection
          Patrick Van Der Hyde

          Hello Nathan,


          It sounds like you are on the right path to a great visualization.  Trend lines are typically over a X axis of Time.  Is there a time component to this?   Otherwise, it sounds like a scatterplot with Performance scores on one axis and average training score on the other would be the best view to use.  The calculation to turn the performance score into a measure could be  done quickly as you have suggested and the resulting output would just be the sum of that field. 


          If there is a date or datetime field that isn't presented here, please share how that is used what the trend line would be indicating (changes in performance over time?)