6 Replies Latest reply on Oct 7, 2016 8:57 AM by James Hereford

    Wilcoxon Rank Sum Test via R Integration

    James Hereford



      Bora Beran -  I've read your blog posts and reviewed your sample workbook on t-tests via R.  You will be the ideal person to answer this question.  I appreciate any time you can give.


      I am trying to implement a wilcoxon rank sum test via R.  I can implement this successfully if I am comparing two separate fields/measures as you do in your t-test example file.  e.g. if I wanted to compare [Field 1] to [Field 2].  Where I'm running into trouble however is that I need to run a test on the same measure, across a given dimension. 


      I have attached a sample workbook with real data to illustrate the issue I'm having.  There are a lot of variables in the workbook but we can focus in on two in particular: "Raw Bias X (g)" and "Raw Bias Wilcox Test".  As can be seen in the table, AVG(Raw Bias X (g)) is shown aggregated across a few dimensions.  One of the dimensions being aggregated across is "Rotation Profile ID", which can take on the value of 'default' or 'sphere'.  I am trying to test for differences in mean bias between the 'default' and 'sphere' sub-populations.


      I can pass the AVG(bias) into R without an issue.  It is a numeric array of length 24 that I've checked with write.csv.  However since "rotation profile ID" is a dimension, I can't seem to properly aggregate it to get a similar length 24 array.  This array should be equivalent to what is seen if you use 'view data' from within the table visualization.  It will have 12 instances of 'default' followed by 12 instances of 'sphere'.  I have attempted ATTR, but this returns * since there are two values.  I have also attempted converting the dimension to a numeric measure and then taking AVG, this failed and appeared to aggregate across an incorrect dimension. 


      I've spent hours reviewing forum posts and articles online and cannot find clear guidance on this topic.  Any suggestions you can provide would be greatly appreciated.  I'd love to figure out how to call the dimension directly and aggregate it properly, but failing that if I can just get the unique values of "rotation profile id" I could always reassemble the array in R. 


      Thank you for the help,


        • 1. Re: Wilcoxon Rank Sum Test via R Integration
          Bora Beran

          Hi James,

          This should work.

          Screen Shot 2016-10-06 at 3.00.42 AM.png





          1 of 1 people found this helpful
          • 2. Re: Wilcoxon Rank Sum Test via R Integration
            James Hereford

            Hi Bora,


            Thank you very much for the prompt reply.  I've spent the past couple of hours trying to implement and unfortunately I'm still having issues.  Using the implementation as you describe I get a "s[[1]] subscript out of bounds" error.  This is actually a bit strange since looking at CSV outputs at different stages in the process would seem to imply that the split is working appropriately.  i.e. in the CSV output s[[1]] and s[[2]] have the proper content and form.


            I noticed in your screenshot that you have an index() pill in the filter - what is the function of this pill and could it explain the difference?  I'm sorry to trouble you but could you upload the solution in my attached workbook?  Ideally I'd like to have the value of this test displayed in a column of the table.


            Thanks again for the assistance.




            • 3. Re: Wilcoxon Rank Sum Test via R Integration
              Bora Beran

              Hi James,

              I am attaching the workbook.


              I think the issue could be with addressing/partitioning settings or level of detail of the sheet.


              You need to pass all the data that is needed for the test from Tableau to R which means data in the view should have the same level of detail as the test inputs require. To do this, I turned aggregations off from the analysis menu.


              I made Rotation profile ID in the view an ATTR.


              This should make the error disappear.


              This of course doesn't mean you have to turn off aggregations. I just didn't know what to use as a dimension for this table from your dataset. If you have a dimension let's call it CaseID, you can set the addressing settings of the table calc to compute using : CaseID. If you have multiple dimensions but if you want the test to run over the data in the entire table put all the dimensions in compute using (this means checking all the boxes in 10.0). If you want to run the test for different cohorts, then you can put the dimension that defines the cohort to partitioning (this means uncheck the box for it in 10.0). E.g. if you want to run the test independently for each experiment, then you can put experiment  to partitioning.


              R will return a single value and Tableau will repeat the value for each row. Typically one would want to show the value only once.


              INDEX() filter keeps the first item and hides the rest so view only shows one value. so it is mostly cosmetics.


              I hope this helps.


              Thank you,



              1 of 1 people found this helpful
              • 4. Re: Wilcoxon Rank Sum Test via R Integration
                James Hereford

                Thanks a ton Bora.  This is really helpful.  I have the visualization working now but still get some outputs that I don't understand in terms of how things are being aggregated, particularly when I place the Wilcox Test calculated field into a 'Measure Value' field. 


                My last question for you - short of me importing my data to R and manually checking all the calculations is there a way for me to see what exact data is being passed into R for each different aggregation?  Right now I can't trust the value that is output by the test because I don't have a clear understanding of what exact data the tests are being run on...


                Thanks again,


                • 5. Re: Wilcoxon Rank Sum Test via R Integration
                  Bora Beran

                  There are a few options


                  1. in RStudio, start Rserve in the following way




                  then insert print statements in your SCRIPT_ functions.  e.g. you can do a print(s) or print(.arg1) in your script. This will print the values in RStudio window


                  2. Run Rserve in debug mode. The folder containing Rserve.exe also has Rserve_d.exe. Navigate to that folder and run Rserve_d.exe which will take over the console and write any traffic between Tableau and Rserve. The script in, data in, data out...


                  3. Insert write.csv in portions of the script where you want to see the state of the data.


                  I often use option 1 myself.


                  ~ Bora

                  3 of 3 people found this helpful
                  • 6. Re: Wilcoxon Rank Sum Test via R Integration
                    James Hereford

                    Thanks Bora.  I knew about 2 and 3 but it seemed difficult to get the data from each pass.  #1 sounds like it will work well.