2 Replies Latest reply on Oct 21, 2017 6:58 PM by Jonathan Drummey

    Integrating R (part 1): Understanding better Window Sum and R integrates or takes values from Tableau

    Mike Lane

      Hello everyone,

       

      Please see that attached dashboard.  My question is very similar to the question asked hereI know I only used Tableau for 1.5 years, but more and more I find myself needing to combine Tableau with either R or Python.  If you know someone who is an expert in that integration or even an expert in Window functions, could you please reach out to them and show them this question.

           In this now I am trying to do an average, and trying to help one of my good friend in Tableau.   Can anyone help?  I would like to walk you through the steps I have taken so far over the last month so you understand the problem better.

       

      Example 1.png

      Attempt 0) For some reason, I cannot get the Windows function to calculate the window average correctly. How does one properly do window sums of window sums?  In fact, related to this my friend found that the Grand Total function will not take in filters or levels (drill down/up).  I am partially wondering if Window function has issues taking in filters, levels, etc. as well, but I could not figure it out.

      So, I tried to do it in R by using mean(), lapply() and sum(), and even just the sum() function in Attempt 1, McalcProblem, and Attempt 2 worksheets.

       

      Example 2.png

      Attempt 1) The first time I got an output, put it was still outputing multiple sum instead of the grand total.  Because it then outputed a vector instead a singular value this brought me the question, how can I better debug and monitor for a particular worksheet what is being sent from Tableau to R? Since, I could not figure out the answer, I copy and pasted the data and then tested it directly in R.  Eventually what I learned, I used to make Mcalc as seen in the Mcalc worksheet.

      Attempt 1) Mcalc was the Rcode to do everything even the average.  Unfortunately, when I put it in Tableau it did not integrate properly?  So I tried the paste() function and filtered over something besides profit.  I got a reasonable answer, but still Tableau would not take a singular output of a sum.  Unfortunately as I mentioned before, if I remove paste() function and make the script real the output will be blank from this worksheet. I am not sure what "  " actually means.  Is the output in from R affect by Nulls or get error values due to them?  Can anyone help debug this sheet to get a singular value? 

       

       

      Example 3.png

      Attempt 2)  So I have tried using both R and Window Sum's together.  In this attempt I finally got things to calculate correctly. \o/ \o/ .  The only problem is if I change the filter from non-null of "Sales$Previous...." to "Profit$Previous...." things break down an I am in the problem that I started with.  How does one properly set up indexes in Tableau for Table Calcs especially when the index is (self-referential) over the same measure/dimension as what your trying to calculate?

       

       

      Questions:

      1) How does one properly do window sum's of window sums? In more detail for example, how does one properly set up indexes in Tableau for Table Calcs especially when the index is (self-referential) over the same measure/dimension as what your trying to calculate.

      2)  How can I better debug and monitor for a particular worksheet what is being sent from Tableau to R?  I have read its unaggregated data and a bit about how Rserve() works, but I cannot seem to find in R that variables from Tableau loaded into R.

      3)  What does the " " output means in Tableau coming from R?  Is it due to nulls?  Can only help me particular debug this sheet in Tableau?

        • 1. Re: Integrating R (part 1): Understanding better Window Sum and R integrates or takes values from Tableau

          Hello Mike Lane,

           

          Thanks for this very detailed thread.

          I *might* have someone with the right skills, but I am also raising this thread to gather more experts.

           

          Jim Wahl, is it something that you could be interested in?

          Thanks for your time

           

          ----------

          Lénaïc RIÉDINGER, Global Community Engineer Tableau

          Tableau Community Forums | Knowledge Base

          If you see a Helpful or Correct response, please mark it thanks to the buttons below the targeted post!

          • 2. Re: Integrating R (part 1): Understanding better Window Sum and R integrates or takes values from Tableau
            Jonathan Drummey

            Hi Mike,

             

            You are asking multiple questions at once and I don't have time to answer all of them, I think I can answer a lot of what is going on by going through the Attempt 0 view.

             

            Fundamentally Tableau's core drag-and-drop experience works great for taking raw data and performing a first level of aggregation on it, then we can point & click get additional aggregations through quick table calculations & things like reference lines, grand totals, etc. What you are running into is that when we start trying to nest table calculations or use Tableau's R or Python integrations then we need to have more awareness of how Tableau works in order to get accurate results.

             

            With the help of (many) others I've organized this awareness into a set of "master concepts" and I'm going to go through a few of them in relation to your questions:

             

            One concept is that Tableau has four levels of calculation: record level (computed for each record, these can be used as dimensions or measures), regular aggregate (like SUM, MIN, MAX, etc. that are measures, also LOD expressions are a form of regular aggregate that in the case of FIXED LOD expressions can be used as dimensions), table calculations (which can be nested and are measures), and the post-aggregate functions like grand totals, reference lines, etc. that are computed by Tableau but not something we can nest inside other calculations. There is also an order of operations in these four levels (record level are computed before aggregate and so on).

             

            So the Sign Profit calculation is a record-level calculation that is used as a dimension filter (filtering out any records with 0 profit before aggregation) and in the case of the moving sum of avg profit and the window average measures both have Sign Profit as a dimension on the Marks Card.

             

            The next concept that explains why you are seeing 2 marks on the window average and not the desired 1 is because the vizLOD (viz level of detail) is Month and Sign Profit, i.e. the two dimensions that are present for this marks card. The vizLOD is made up of dimensions on Rows, Columns, Pages, and the marks card. Tableau generates a mark for each & every distinct combination of values of those two dimensions. Awareness of the vizLOD (which can be different when there are multiple marks cards in a single worksheet) is critical to know how many marks you'll need and for table calculations what dimensions are used for addressing and partitioning.

             

            The third concept is around order of operations. Tableau computes calculations from the innermost expression to the outermost using the PEMDAS order of operations that we learned in grade school *with* the order of operations of the calculations having precedence. So all record level calculations are computed before regular aggregates, regular aggregates before table calculations, and so on.

             

            So in the Attempt 0 view there are 3 different measures on Rows:

             

            - SUM(Profit) that is aggregated to the level of Month & Sub-Category.

            - The moving sum of AVG(Profit) quick table calculation. The effective formula is WINDOW_SUM(AVG([Profit]),-2,0) and the compute using is set to both Month & Sign Profit. I'll break down this calculation: The AVG(Profit) is calculated the level of Month & Sign Profit.This is not the average of the profit per month & sub category, this is the average of the *record-level* profit values and since Sign Profit is a dimension on the vizLOD the average is computed for all of the records with negative profit and then the average is computed again for all the records with the positive profit. Then these results are aggregated by the moving sum that is summing those profits over the current month & two months prior.

            - The window avg measure has the formula WINDOW_SUM(AVG([Profit])), the compute using is again on Month & Sign Profit, and Month & Sign Profit are setting the vizLOD. So again the AVG(Profit) is the average of the record-level profit values for each Month & Sign Profit, then the WINDOW_SUM() is being computed across both of those dimensions and there are two marks for each month because of the Sign Profit measures.

             

            So now we have an explanation of what Tableau is doing in the view, the next question is how to get you towards your goal?

             

            The fourth concept I'll describe relates to this, and this is around working with calculations in Tableau, and it's that a way we can help to be assured of the results that we want is to write down the desired goal in plain language to help us get to the point of breaking down the goal into specific dimensions & measures that fit within how Tableau works.

             

            In the first screenshot you describe that you're expecting the average for January 2017 to be $7140/24 = 297.4. While that is describing an exact value and that can be really helpful sometimes, in this case it's not because it's not specifying what the goal is. I can see where the numerator of $7140 is coming from, that's the total SUM(Profit) for January 2017. I'm not sure where the denominator of 24 is coming from. I'm thinking that the 24 might have come from the number of months, however given the vizLOD of Month & Sub-Category for the month of January 2017 there are 17 sub-categories, so I'm guessing that the desired number would actually be $7140/17 = $420.

             

            Restating this in plain language it appears that the goal is to display two aggregations on the view:

            - The sum of profit per sub-category and month as stacked bars. This is the SUM(Profit) measure with the vizLOD of sub-category & month.

            - The average per month of the sum of profit per sub-category and month. This is the measure you want built, so we have an aggregation (the average for each month) of an aggregation (the sum for each sub-category and month). In this case we can use a WINDOW_AVG(SUM(Profit)) table calculation that will compute along Sub-Category and therefore partition (restart) for each month.

             

            In this view I've built the jtd WAvg Profit calculation that does that and added it to the view and made sure that Sub-Category was on detail on the Marks card this is the bottom view:

            Screen Shot 2017-10-21 at 9.24.11 PM.png

             

            So we can see from the tooltip that the average is the expected $420, however there are 17 marks stacked up there. This is because the vizLOD is Sub-category & Month which is necessary to get the regular aggregate sum profit per sub-category & month measure to come out with the right values that we can then compute the window average over.

             

            This leads to the last of the master concepts I'll talk about here, and that is that sometimes our desired displayLOD is at a different grain than the vizLOD. In this case the vizLOD needs to have Sub-Category as a dimension to get the calculation to work right yet the desired *display* effectively only has the LOD of Month. There are a couple of ways we can deal with this by using table calculation filters or IF statements inside the table calculation to only return a single mark for each value of our desire displayLOD, in this case we want the latter.

             

            The jtd WAvg Profit Optimized calculation has the formula IF FIRST()=0 THEN WINDOW_AVG(SUM([Profit])) END. With a compute using on the Sub-Category dimension this will return a single non-null value for each Month:

             

            Screen Shot 2017-10-21 at 9.31.11 PM.png

             

            And duplicating that view and moving some pills around we can show a single mark for each:

            Screen Shot 2017-10-21 at 9.34.30 PM.png

             

            If this is not what you wanted, please let me know and we can figure it out.

             

            To review so far...when working with calculations in Tableau it's extremely helpful to have an awareness of key concepts of how Tableau works. The concepts I described here are:

             

            - the four levels of calculation

            - the vizLOD

            - order of operations

            - specifying calculations in plain language

            - displayLOD vs. vizLOD

             

             

            Now there's another way to the desired result (assuming my guess was right) and that's using an LOD expression. {INCLUDE [Sub-Category] : SUM(Profit])} tells Tableau that no matter what the vizLOD is to include the Sub-Category in the level of detail of computing the SUM(Profit). In this view I've used that measure with the AVG() aggregation and then we don't need Sub-Category in the level of detail to get the desired result:

             

            Screen Shot 2017-10-21 at 9.42.46 PM.png

             

            I've attached a v10.3 workbook with all of this.

             

            Finally, a note about using Tableau's R & Python integrations...if all you're doing is summing, averaging, etc. over the marks then you can probably do everything you want with record level calculations, regular aggregates, table calculations, and/or level of detail expressions (plus maybe computed Sets). Where R and Python go beyond what we can easily do in Tableau is use cases where we need to perform recursive functions over the data. Given that R & Python work in Tableau as table calculations, my general suggestion to users is to improve one's skills at table calculations first before trying to apply R or Python because they add the additional complexities of understanding how Tableau is communicating with those applications that are totally dependent on factors like the vizLOD, the table calculation compute using setting, etc.

             

            In looking at a few of the R calcs in the workbook you provided I didn't see any of those, so it's likely that with a better understanding of how Tableau works you can save R for what it's good at.

             

            For more information here are some links:

             

            A set of links on understanding table calculations: Want to Learn Table Calculations? Here’s How! | Drawing with Numbers

            A set of links on Level of Detail expressions: Level of Detail (LOD) Expressions | Drawing with Numbers

            A primer on Tableau's R integration, this includes links and a lot of detail: Tableau and R Integration

             

            Let me know if you have any questions!

             

            Jonathan

            3 of 3 people found this helpful