2 Replies Latest reply on Aug 30, 2018 8:05 AM by Jonathan Drummey

    Aggregate R calculations correctly

    Daniel Wegmann

      Hallo together,


      I have a simple example of R calculations in Tableau:


      There are 2 dimensions Dimension1 and Dimension2 as a hierarchy.


      Additional I have one measure called measure 1. Now I want to calculate the quantile value over all same values grouped in Dimension1 so that the result is:


      Dimension1     Calculated_Field (90% quantile)

      A                      5

      A                      5

      A                      5

      B                      6

      B                      6


      I am using this Script:



      quantil = quantile(.arg1, 0.9)



      SUM([measure 1]))


      If I show both dimensions and do the table calculation over both dimensions i get


      Dimension1     Dimension2     Calculated_Field

      A                      X                    5

      A                      Y                    5


      and so one...


      If I am using the following script




      dimension1 = .arg2

      dimension1list= unique(dimension1)

      quantil = NULL


      for(i in 1:length(dimension1list)) {


        selection = which(entype == dimension1list[i])

        quantil[auswahl] = quantile(.arg1[selection], 0.9)




      quantil', SUM([measure1]), ATTR([dimension1]))


      I get


      Dimension2     Calculated_Field

      X                      5

      Y                      5


      and so one..


      But I want the result as above showing only the first dimension....


      Can anyone help me?


      Thanks, Daniel

        • 1. Re: Aggregate R calculations correctly
          Patrick Van Der Hyde

          Hello Daniel,


          I have moved this to Tableau External Services where other questions pertaining to using R with Tableau are often addressed by other users of Tableau with R and members of the Tableau development team. 



          • 2. Re: Aggregate R calculations correctly
            Jonathan Drummey

            Hi Daniel,


            Tableau's R integration is not the easiest feature to use, so I understand having some questions when trying to use R inside Tableau. I've got few points to try to help you out.


            First of all, I regularly see R (or SQL, or Python, or ...) users come to Tableau and assume that Tableau can't do X and therefore they are trying to do X using what they know inside Tableau and struggling a bit when Tableau actually has feature X built in. In this case Tableau has PERCENTILE() and WINDOW_PERCENTILE() functions along with percentile reference distributions, one of those might be able to get the results that you want without needing R.


            Secondly for those of us used to working with tools that are always at a record level (even as we are transforming data) like R, Python, SQL, Excel, etc. there's a shift in mental models because Tableau by default is always aggregating our data. What I mean by this is that when we put one or more dimensions in a view that Tableau is only going to show as many marks as there are combinations of dimension values, and then the measures are aggregated for each dimension. So the first desired result table described with Dimension1 and Calculated_Field (90% quantile) isn't actually possible as-is in Tableau because there are only 2 values of dimension1 so Tableau's default will be A, 5 and B, 6. In order to render a table with 6 rows as described we'd need another dimension or a tech of turning off aggregated measures adding a table calculation to render a table.


            Now Tableau does this aggregation by default as part of the drag & drop visual analytics experience. The next level of that is with Tableau's table calculations: they let us do complex aggregates of aggregates via point & click. So what might take us several lines of R code or nested queries in SQL can be done in a few seconds in Tableau without ever touching the keyboard. This is fantastically useful in many circumstances, and it's a different way of working. In particular in tools like R we have the data, then write a line of code that manipulates & transforms the data, and then we can use that transformed data to write another line of code. Whereas in Tableau we start with data, and can point & click and transform the data, and a lot of the time that can be enough, and sometimes then we want to do something like run some R code and run into a challenge.


            The challenge comes from the fact that Tableau's external services integrations for R, Python, and Matlab aren't running on our data, they are running on top of the set of transformations that we've configured. R, Python, and Matlab are all run as SCRIPT functions that computed as table calculations which means they are effectively aggregations of aggregations. To fully understand (and control) what they are doing we need to have an understanding of Tableau's aggregation behavior and how table calculations interact with the viz level of detail (vizLOD) that is based on the dimensions in the view along with how table calculation compute using settings can change the results.


            Now because Tableau does so much via point & click we need to see how you've configured Tableau to understand the results you're getting so we can help you. Just sharing the R code isn't enough, we need to see the different views that you've set up with their compute usings of the calculated field, the dimensions & measures that are in those views, and how they have transformed the underlying data. So to really help you we'd need to see a packaged workbook, for more information on why see Packaged workbooks and flows: when, why, how.


            And there's one final point. When we're used to the [data] - [transform] - [repeat] cycle when we get stuck we'll very naturally tend to ask questions that are focused on what we need to do for that next transformation step. In the case of Tableau where Tableau can do so much abstraction so quickly it's helpful to have a sense of what your ultimate goal is because your goal may be available at a click or require an alternative route due to Tableau's varying capabilities for different features. For example if you're looking to use that 90th percentile in order to build a cohort to use in further calculations that might be do-able the R script or a WINDOW_PERCENTILE table calculation or we might need to use a FIXED level of detail expression with the PERCENTILE function. In this case when I see that original desired table result with repeating values it makes me suspect that the ultimate goal is not just to make that table, so more information about what you are trying to do would be helpful as well.