5 Replies Latest reply on Dec 24, 2013 5:20 AM by Jonathan Drummey

    R integration: basic calculations & graphs (Part2)

    Rollie Parrish

      So, with help from Jim Wahl & Jonathan Drummey, I'm successfully sending values to R and receiving the expected results of a basic calculation. R integration: Basic calculations

       

      Now, the problem I've run into is creating a line graph from the results. In the example shown below, the LOS Average is calculated by Tableau and the Geometric Mean LOS (GMLOS) is being calculated via R.  A bar graph works ok for GMLOS  but a Line graph does not. The values are being recalculated when the mark type is changed from bar to line.

       

      Any suggestions?

       

      12-21-2013 10-34-15 AM.jpg

       

      12-21-2013 10-27-18 AM.jpg

        • 1. Re: R integration: basic calculations & graphs (Part2)
          Rollie Parrish

          OK, appears to be something different about how Tableau is sending/receiving the data when working with bar graphs vs. line graphs.

           

          In the attached workbook, a simple debugging function has been added to the calculated fields that are using R. It captures the data that is coming and going to a logfile in the R working directory. The output shows that very different values are being sent and received for bar graphs vs. line graphs.

           

          In this example, the values for LOS are being sent to R to calculate the mean for each Group.

           

          These are the expected results for one point on a bar graph:

          [1] "######## START ####"

          [1] "######## Arithmetic LOS"

          [1] "Sun Dec 22 09:05:11 2013"

          [1] "incoming from Tableau"

          [1]  1  2  3 10

          [1] "back to Tableau"

          num 4

          [1] "######## END ####"


          The these are the results for the same point using a line graph.

          [1] "######## START ####"

          [1] "######## Arithmetic LOS"

          [1] "Sun Dec 22 09:07:52 2013"

          [1] "incoming from Tableau"

          [1]  1.000000e+00  2.000000e+00  3.000000e+00  1.000000e+01 9.654043e-321

          [6] 9.654043e-321 9.654043e-321 9.654043e-321 9.654043e-321 9.654043e-321

          [11] 9.654043e-321 9.654043e-321 9.654043e-321 9.654043e-321 9.654043e-321

          [16] 9.654043e-321

          [1] "back to Tableau"

          num 1

          [1] "######## END ####"


          Any ideas what's going on here?

          12-22-2013 9-36-52 AM.jpg

          • 2. Re: R integration: basic calculations & graphs (Part2)
            Jonathan Drummey

            Hi Rollie,

             

            I love your R logging script, that's brilliant! There's supposed to be some

            way to get Rserve to log to the console, but I haven't been able to get

            that working in my setup, so being able to turn it on & off per function is

            really helpful!

             

            The problem you're experiencing is due to what Joe Mako calls "Mark Type

            Filling" - we don't know the Tableau term for it - where for Line, Area, &

            Polygon mark types when there are 2 or more dimensions in the view and a

            table calc, Tableau will complete the domain of the combinations of

            dimensions. That's why in the calls to R you're seeing 16 arguments instead

            of just 4 for each Group, and why the bar charts work and the lines don't.

            In the attached I set up a walkthrough of the problem. I don't have a good

            idea of how to make Tableau do the "right thing" (i.e. not do the mark type

            filling). I've got an idea of how to get R to do the right thing with the

            extra data that Tableau is sending, theoretically you could add a couple of

            arguments to explain the padding and current index to be able to filter out

            the padded data in R, but that would be a lot of extra work in R. However,

            I do know how to get Tableau to compute the arithmetic mean LOS and

            geometric mean LOS without using R, so I set those up in the attached as

            well, in the "All in Tableau crosstab," "Bar," and "Line" charts.

             

            I'm going to send this one in to Tableau tech support, it's an interesting

            (and painful) side-effect of mark type filling that is new to me.

             

            Jonathan

             

             

             

             

             

             

            On Sat, Dec 21, 2013 at 1:45 PM, Tableau Community Admin <

            1 of 1 people found this helpful
            • 3. Re: R integration: basic calculations & graphs (Part2)
              Rollie Parrish

              Thanks Jonathan. I probably will end up using the Tableau calculations for GMLOS.

               

              This exercise has been more about learning how Tableau works with R. I wanted to start with something a little more basic than the predictive model examples that are currently online.

              • 4. Re: R integration: basic calculations & graphs (Part2)
                Jonathan Drummey

                Hi Rollie,

                 

                I figured out (remembered, actually) a solution for the Mark Type Filling, I'll type it up and get it to you tomorrow.

                 

                Jonathan

                • 5. Re: Re: R integration: basic calculations & graphs (Part2)
                  Jonathan Drummey

                  All it took was some pre-Christmas dinner with family to get my brain working again and remember Joe's solution for unwanted densification. To take care of that and finish getting the lines to draw in your workbook, there are two major steps, seen in Line Fixed part 1 and Line fixed part 2 in the attached workbook:

                   

                  1) We can turn off the Mark Type Filling by only having aggregate measures on Columns and Rows. Where we need a discrete (blue) pill to generate headrs, we can use a discrete aggregate like MIN(), MAX(), or ATTR(). So I put Group on the Level of Detail and MIN(Group) on Columns. (For performance, when I'm sure about the level of detail in a view I'll use MIN() or MAX() because ATTR() is somewhat slower).

                   

                  However, this leaves 4 individual marks that are not connected into a line:

                   

                  Screen Shot 2013-12-24 at 8.04.24 AM.PNG.png

                   

                  There are 2 related issues with getting the marks to draw a line:

                   

                  a) We have 4 plotted marks and 12 Nulls. Ordinarily we can filter out those Nulls and Tableau will connect the 4 marks, for example by putting a copy of hte Geometric LOS on the Filters Shelf and filtering for non-Null values, but that won't work without some other changes, because...

                   

                  b) There are two dimensions (ID and Group) on the Level of Detail Shelf. Tableau can generally connect marks into a line across one dimension, but not when there are multiple dimensions on the same Shelf - that creates panes/partitions. The Path Shelf can be an option to connect marks, but it also won't cross the multiple dimension boundary (as far as I know)

                   

                  The solution is to create a Tableau Combined Field of ID & Group, and put that on the Level of Detail, and apply the Geometric LOS filter from a) - we could generate a sort of the Combined Field, but using Geometric LOS filter is faster. Tableau Combined Fields can still address & partition on the individual dimensions in the Combined Field, while being treated as one dimension for the line:

                   

                  Screen Shot 2013-12-24 at 8.04.50 AM.PNG.png

                   

                  Rollie, I don't know your volumes, if you have tens of thousands of records or more (ideally hundreds of thousands of records) I'd be curious as to which is faster, the Tableau & R calcs or the Tableau-only calcs. My hunch is that with a decently fast data source such as a Tableau data extract or tuned database that the Tableau-only solution would be faster because there is less data that has to be pushed across the wire from the data source to Tableau and Tableau into R & back again.

                   

                  Also, I personally think this is more complicated than it needs to be. Even though we have a workable solution to get a line, it requires knowing about multiple undocumented features of the software.