2 Replies Latest reply on Sep 7, 2016 4:57 PM by Mary Solbrig

    Chi-Square P-Value Calculation with R

    Viva Gore

      Hi all,

       

      I've been trying to figure out how to use the Tableau/R integration to run Chi-Square tests. I've downloaded Bora Beran's workbook from this blog post.

      I've also gone through a previous tableau community discussion where Bora has gone through each line of his code to explain it.

       

      So this is the calculated field, P_Value_ChiSquare:

       

      SCRIPT_REAL(

      'mm <- data.frame(commodity = .arg1);

      d <- split(mm,rep(1:.arg3[1],each=.arg2[1]/.arg3[1]));

      zt <-do.call(cbind, d);chisq.test(zt)$p.value'

      ,SUM([Number of Records]), SIZE(),[Priorities])

       

      [Priorities] is calculated by WINDOW_COUNT(ATTR([Order Priority]))

       

      I'm connected to Rserve. In the workbook example, this calculated field outputs a p-value of .4205. If I then replace the  P_Value_Chi-Square with itself (so that the calculation is made again through my connection to Rserve) - I get an Error:

       

      Error in chisq.test(zt) : 'x' must at least have 2 elements

       

      Any ideas as to why this isn't working for me? I've made no changes to the calculated fields.

      I discovered that it wasn't working for me in this workbook after trying to apply the process to my own data/workbooks.

        • 2. Re: Chi-Square P-Value Calculation with R
          Mary Solbrig

          Suspicion: you are only passing 1 element over at a time, meaning that you are calculating the test on a single value instead of the vector of values. For instance, if you ran "chisq.test(c(1))" in R, you would get the same error message.

           

          To verify whether this is the case:

          1. Start Rserve using run.Rserve() (this keeps it in the session so any print statements will appear in R Studio instead of being swallowed)

          2. Replace the code with the following:

           

          SCRIPT_REAL(

          '

          print("arg1 is ")

          print(.arg1)

          print("which has size ")

          print(.arg2)

          mm <- data.frame(commodity = .arg1);

          d <- split(mm,rep(1:.arg3[1],each=.arg2[1]/.arg3[1]));

          zt <-do.call(cbind, d);

          print("Taking chisq.test of ")

          print(zt)

          chisq.test(zt)$p.value'

          ,SUM([Number of Records]), SIZE(),[Priorities])


          That will give you a sense of what exactly is being passed to R.

           

          Let me know what that says and I might be able to help further. A sample workbook, even just with some dummy data, would also be really helpful.