7 Replies Latest reply on Dec 17, 2013 6:17 PM by Jonathan Drummey

    Can I pass a measure as a vector to R?

    Gabriel Griggs

      I have a measure with 68 data points.

       

      Is it possible for me to pass this as a vector to R or do I have to pass this as an aggregate?

        • 1. Re: Can I pass a measure as a vector to R?
          Mark Ewing

          I would like to know this too.  I want to do custom aggregation using R.

           

          I have one table with in and out times for a building.  I have another table that has each minute of each day.  I want to count the number of people who are in the building at each minute of the day.  To do this though, I need to pass the first table as a vector (after filtering to a specific day or days) and get the count for each minute of they day.

          • 2. Re: Can I pass a measure as a vector to R?
            Andrew Ball

            Well, the answer is - sort of. It all depends what you mean.

            If you have 68 rows in your source data, and you want to pass them all to R, just untick "Aggregate Measures" in Tableau. If you have no other dimensions slicing your data, this will then pass a vector with 68 elements to R.

             

            Tableau pass through all the points within the current partition as a vector, so if you add other dimensions to a view, but want to pass through all the points at once, you need to add all dimension fields into the Addressing part of the table calculation that contains your R code.

            1 of 1 people found this helpful
            • 3. Re: Can I pass a measure as a vector to R?
              Jonathan Drummey

              Andrew and I were just going through this the other day, I wanted to clarify one bit of what he wrote: R is run as a table calculation, so you always have to use the R functions with a aggregates measure like SUM(something) or even RUNNING_SUM(SUM(something)). So if you want to pass a vector to R with your row-level data, you have two options:

               

              1) disaggregate the measures using Analysis->Aggregate Measures->Off. This doesn't actually cause the measures to stop their computations, instead it tells Tableau to return every row in the data without applying the aggregation by dimensions on the view. So this can get the desired results, but can cause problems for views that we want to have R work on the non-aggregated data but display the data with some level of aggregation.  The second solution deals with this situation.

              2) Add a dimension such as a unique Row ID or Data Point ID to the view, and set the Compute Using (addressing) of the R function to be along that dimension. If it takes multiple dimensions to increase the level of detail of the view so that it's returning row-level data, then you'd need to use all of those dimensions in the addressing via an Advanced... Compute Using. If we need to then perform additional aggregations on that data, we can do so with table calculations with the appropriate addressing that take into account the increased level of detail in the view.

               

              Jonathan

              • 4. Re: Can I pass a measure as a vector to R?
                Gabriel Griggs

                Thank you, Andrew and Jonathan.

                 

                I think that I understand this a little better. The data structures of SQL and Tableau are still something that I am getting used to. I am coming from a very heavy MATLAB / C++ background, so everything is about vectors and matrices where I come from and I am still trying to get a handle on joins / unions / etc.

                • 5. Re: Can I pass a measure as a vector to R?
                  Mark Ewing

                  Jonathan,

                   

                  I've used your tip to get the right numbers calculated, now I'm having an issue plotting it.

                   

                  What I'm trying to do is count the number of people in a building at any given minute.  In my data I have a time in with an [Hour In] and [Minute In] plus the same thing for time out.  These are measures and are numbers, not datetimes.

                   

                  My calculated field is:

                  SCRIPT_INT("hi <- .arg1;ho <- .arg2;mi <- .arg3;mo <- .arg4;ti <- hi*60+mi;tou <- ho*60+mo;

                  out <- rep(NA,length(hi));for(i in 1:length(hi)){out[i] <- sum(ti <= ti[i] & tou >= ti[i])};out",

                  AVG([Hour In]),AVG([Hour Out]),AVG([Minute In]),AVG([Minute Out]))

                   

                  When I make a tabluar view with [Date In] (a date dimension) then the value it returns is correct where only one person scanned in that minute - if multiple people scanned in the same minute, it's short by the number of people who scanned in during a single minute.  This error adds up and my maximum is much lower than it should be.

                   

                  When I make a tabular view in Tableau with Hour In, Minute In (using my date dimension [Date In], choosing the ate part of hour, then another with the date part of minute) and Row Number (this was generated by SQL when pulling the data in) then my counts are correct but they are duplicated within a time, so, 8:04 has two people scanning in, they both correctly show 19 people in the building.

                   

                  What I want to do is plot the data over time, so, hour on the x axis and the y axis is the number of people in the building.  To get the correct values, I have to include Row Number, but I can't make the view I want if Row Number is included.  When I go to set the 'Compute Using' I'm using Table (down), not the Row Number- when I use the Row Number, I get the count of people who scanned in that particular minute.

                   

                  Is there a way around this?  Or am I approaching this wrong?  Thanks for the help you've provided thus far.

                  • 6. Re: Can I pass a measure as a vector to R?
                    Andrew Ball

                    Hi Mark,

                     

                    The thing you have here is that you need to do this calculation for each individual.

                    If you have the people as the Partition, then each person will be passed individually to R, so you will need another table calculation to bring them together.

                    If you have the people on Addressing, then they all go through as one vector, so you will need to use R to target each person in turn.

                     

                    Personally, I'd scrap R and go pure Tableau:

                    • 3 datasets - #1 is the datasource with just the time in (can actually use a two row datasource if you use Include missing values), with #2 and #3 being the same.
                    • Call the date in #1 "Date of Interest" or something
                    • Create a date in #2 using the hour and minutes in, call it "Date of Interest", rename the datasource to "People In"
                    • Create a date in #3 using the hour and minutes out, call it "Date of Interest", rename the datasource to "People Out"
                    • Use #1 as the primary, and create a calculated field of "Movement", and use SUM([People In].[Number of Records]) - SUM([People Out].[Number of Records])
                    • Put "Date of Interest" from #1 on the Columns and your "Movement" field on the rows. And there you go, a record of the changes during that time period.
                    • If you want the amount in the building at any one moment, then just use a running_sum([Movement]) calculation.
                    • Of course, you may need to make changes if not everyone swipes In/Out, but those should be easy to spot.

                     

                    If you haven't used this sort of technique before, it is called "Scaffolding", and is very useful for twisting your data in Tableau.

                    • 7. Re: Can I pass a measure as a vector to R?
                      Jonathan Drummey

                      Hi Mark,

                       

                      This kind of analysis requires hacking things a bit, one option is what Andrew described. I put together a list of posts relating to various techniques here: Utilization, Queues, Census, Throughput for Blocks, Floors, Units, Rooms, Offices, etc. - What do you do?.

                       

                      In any case, I'm pretty sure from your description that what you're running into is the not-so-uncommon case where you need to have the view at a finer level of detail to get the calculations to work - i.e. needing Row Number in the view - while wanting a coarser level of detail for the visual display. I believe what you could do is modify the formula to IF FIRST()==0 THEN SCRIPT_INT(your code) END and use the Compute Using setting that gets you accurate results. The IF FIRST()==0 would only return the accurate result for the first person scanning in for each partition (the date/time dimension) and Null for everyone else. Then I usually Ctrl+drag a copy of that pill onto the Filters Shelf and set it to filter for non-Null values, so Tableau is only drawing as many marks as needed.

                       

                      However, with Row Number in the view, you'll end up with a whole lot of rows returned from the data source to Tableau (one for every Row), and at some point as N gets larger that could slow things down. Whereas a solution like Andrew's that uses data blends is down to 2-ish rows per granularity of the date/time dimension, and that could be a lot fewer rows with much higher performance.

                       

                      Jonathan