7 Replies Latest reply on Jun 2, 2016 2:08 PM by Jonas Beullens

    Tableau and R

    Jonas Beullens

      I want to create a new variable based on 3 other variables.

      I got an R script that works in R, and I adapted it to work with R in Tableau.

       

      SCRIPT_REAL('

      data = data.frame(SalesOrderNumber=.arg1, UnitPrice=.arg2, Unit_Freight_Cost=.arg3);

       

      library('dplyr');

       

       

      a=0;

      for(x in data$SalesOrderNumber){

        a= a+1

        data$OrderPrice[a] <- 0

        for(i in which(data$SalesOrderNumber==x)) {

          data$OrderPrice[a] <- data$OrderPrice[a] + data$UnitPrice[i]

        }

      };

       

       

      b=0;

      for (x in data$SalesOrderNumber) {

        b= b+1

        data$TotFreightPrice[b] <- length(which(data$SalesOrderNumber==x)) * data$Unit_Freight_Cost[b]

        data$percentage[b] <- data$UnitPrice[b] / data$OrderPrice[b]

        data$NewUnitFreight <- data$percentage * data$TotFreightPrice

      };

      data$NewUnitFreight; ',

      AVG([SalesOrderNumber]),AVG([UnitPrice]),AVG([Unit_Freight_Cost]) )

       

      The above script doesn't work, first of all because the SalesOrderNumber is a dimension and so an aggregate function does not make sense, but it does also not work without it.

      Can anyone help me how I can create a new variable through R?

        • 1. Re: Tableau and R
          Alexander Mou

          Try this:

           

          1.Duplicate the salesordernumber. Convert it to Number if it is string type.

          2.Drag it to Measures shelf.

          3.Use this new measure in your script.

           

          See if this works for you.

           

          On Tue, May 31, 2016 at 2:42 PM, Jonas Beullens <

          • 2. Re: Tableau and R
            Jonas Beullens

            True trial and error, I found out that MIN() does the trick!

            • 3. Re: Tableau and R
              Jonathan Drummey

              [Jonas had emailed me on this]

               

              R scripts are run as table calculations and therefore they are processed *after* aggregation and all values used in the R scripts must be constant (like coming from a parameter or an entered number or string value) or aggregate.

               

              It's pretty clear in the script that you want to treat SalesOrderNumber as a dimension, and for the R script to run from Tableau it must be a measure.

               

              Therefore you need to do two, maybe four things:

               

              1) Have SalesOrderNumber as a dimension in the view.

              2) Use MIN(SalesOrderNumber), MAX(SalesOrderNumber), or AVG(SalesOrderNumber)  as the measure in the R script--you've already chosen AVG(SalesOrderNumber).

              3) Most likely have the Compute Using for the R script along the SalesOrderNumber.

              4) Is the SalesOrderNumber unique at the record level or is there a finer grain that might include something like an OrderItemNumber (i.e. there are multiple order items per order)? If so then that would need to be a dimension in the view as well (to make the AVG(UnitPrice) and AVG(Unit_Freight_Cost) come out right) and the Compute Using including that.

               

              Also, the script is including the dplyr library, I'm not seeing any functions that would require it? The reason I mention that is because every time the view is refreshed the dplyr library will be loaded and if you don't need it then that's unnecessary overhead.

               

              I'm not that good at reading R scripts, it seems like you are trying to parcel out the freight costs based on the unit price and order price. There might be other ways of doing this in Tableau using Tableau's native calculations that don't require R scripts that would be a lot faster. Can you describe the logic in more detail? Also, what is your data source (text file, Excel, SQL Server, etc.)?

               

              Jonathan

              • 4. Re: Tableau and R
                Jonas Beullens

                Thank you for your well appreciated help.

                I am still left with one issue. What I am trying to do here is to split the freight costs relatively among the unit price of each item in the order.

                Now I want to make a new variable with the new freight cost and use this one in tableau. This code worked but there is still a problem.

                 

                My problem is that my new output data$NewUnitFreight is an aggregate function and what I wanted was to subtract this new variable from another one and this gets an error saying that I cannot use this aggregate variable.

                Is there any way to do it? Basically I want to create a new variable with the values from data$NewUnitFreight.

                 

                 

                 

                SCRIPT_REAL('

                data = data.frame(SalesOrderNumber=.arg1, UnitPrice=.arg2, Unit_Freight_Cost=.arg3, OrderQty=.arg4);

                 

                 

                 

                 

                 

                 

                a=0;

                for(x in data$SalesOrderNumber){

                  a= a+1

                  data$OrderPrice[a] <- 0

                  for(i in which(data$SalesOrderNumber==x)) {

                    data$OrderPrice[a] <- data$OrderPrice[a] + (data$UnitPrice[i] * data$OrderQty[i])

                  }

                };

                 

                 

                c=0

                for(t in data$SalesOrderNumber){

                  c= c+1

                  data$TotFreightPrice[c] <- 0

                  for(j in which(data$SalesOrderNumber==t)) {

                    data$TotFreightPrice[c] <- data$TotFreightPrice[c] + (data$Unit_Freight_Cost[j] * data$OrderQty[j])

                  }

                };

                 

                 

                 

                 

                b=0;

                for (x in data$SalesOrderNumber) {

                  b = b+1

                  data$percentage[b] <- (data$UnitPrice[b]) / data$OrderPrice[b]

                 

                };

                 

                 

                data$NewUnitFreight <- (data$percentage * data$TotFreightPrice);

                data$NewUnitFreight

                ',

                MIN([Sales Order Number]),AVG([Unit Price]),AVG([Unit Freight Cost]), AVG([Order Qty]) )

                 

                THANKS!!

                • 5. Re: Tableau and R
                  Jonathan Drummey

                  My suggestion is to move your logic inside Tableau. As you’ve found the output of R expression is a table calculation aggregate and therefore any further use of that calculation requires using an aggregate or table calculation. It can be done, but to do so requires an understanding of table calculations, the level of detail of the view, the desired level of detail of the viz, and other factors. If you “push” down the calculations in Tableau (potentially using Level of Detail expressions) then you can end up with the desired results in a much easier fashion.

                   

                  If you can post some sample data with an example of the logic I can take a look to give you some direction.

                   

                  Jonathan

                  • 6. Re: Tableau and R
                    Jonathan Drummey

                    Hi Jonas,

                     

                    I had some time at lunch today and quickly did a mockup of some data and some calcs to demonstrate what I mean. I'm strongly suspecting that there's something I missed and/or some additional factor (such as additional dimension(s) in your Tableau view) that are making the results of the NewUnitFreight calc in the mockup data I created not be different from the original Unit_Freight_Cost.

                     

                    I used the Superstore Data to get a record-level Unit Price, Order Quantity, and Unit_Freight_Cost. Each record is uniquely identified by a Row ID and there are 1 or more Row IDs for a given Order ID. (I left this hierarchy in place because from looking at the R script I'm suspecting there might be multiple records per SalesOrderNumber and an extra dimension in the view that is breaking those down).

                     

                    Then I built record-level calculations based on your above logic to calculate an Row Price (Unit Price * Order Quantity), a Row Freight Price (Unit_Freight_Cost * Order Quantity), a Row % (Unit Price/Row Price), and a New Unit Freight (Row % * Row Freight Price).

                     

                    Here's the workout view:

                     

                    Screen Shot 2016-06-02 at 1.06.15 PM.png

                     

                    Note that the NewUnitFreight has the same value as the Unit_Freight_Price. I followed the logic in your R scripts as closely as I could but as I'd noted either I'm missing something in the logic or there's an additional factor in your data or Tableau view. If you'd like to use this as an example then please let me know where the difference is.

                     

                    Now for why I went through this. Since these are record-level calculations they can then be aggregated as regular aggregate measures (SUM/AVG/MEDIAN/etc.) and/or used as dimensions (such as for histograms), neither of which is possible with measures from R scripts. There are ways to get the effect of using R scripts as measures or dimensions by nesting them inside other calculations and as I'd noted in my prior post that gets more complicated. Also, since these are record level calculations the computations are performed in the data source and (unless there are hidden complexities in your data) will likely be faster than doing them as R scripts.

                     

                    Jonathan

                    • 7. Re: Tableau and R
                      Jonas Beullens

                      Dear Jonathan,

                       

                      Thank you for your well appreciated help!!!

                      I found out yesterday night how to do it in native Tableau (in a very similar way as you did it).

                      I was too stubborn to try to do it in R before, since I believe that the combination of both could potentially be very powerful. (I'm confident in R and just started with Tableau about a week ago.) Either way, my problem is solved!

                       

                      I insist on thanking you again for the effort you have put in helping me out !

                      You're explanations are very clear and elaborated and very helpful for the Tableau community.

                       

                      Jonas