1 Reply Latest reply on Nov 11, 2014 11:09 AM by Mary Solbrig

    Chi square reading first two lines of table - R Integration

    Rachel Factor

      Hi,

       

      I am trying to get a p-value from a chi square test between two groups of schools. I am looking at demographic makeup, for which each cell tells me the number of students in that school. My data looks something like this:

       

      SchoolName    RaceIndicator   Number_of_Students

      SchoolA          Hispanic          100

      SchoolA          AfricanAmer     30

      SchoolA          White                200

      SchoolB          Hispanic          200

      SchoolB         AfricanAmer     25

      SchoolB          White               400

       

       

      I have used tips from other Tableau community threads on the topic of Chi-square testing but they are set up to count frequencies, whereas my data would need to be weighted.

       

      Since I can produce the tables I want in Tableau, I thought I could do an easy work-around by just outputting the first two rows of data into R.

       

      This is my script:

       

      SCRIPT_REAL("
      a <- .arg1
      b <- .arg2

      data <- rbind(a,b)

      chisq.test(data)$p.value
      "
      ,
      INDEX()=1, INDEX()=2)

       

      I also tried creating a "RowNumber" calculated field and input that for .arg1 and .arg2. Both do not seem to work and consistently spit out "0.317".

       

      This is what my dashboard looks like currently:

      dashboard.pvalue.png

      I know from manually putting this data into R that the p-value is wrong.

       

      Please let me know where I erred!

        • 1. Re: Chi square reading first two lines of table - R Integration
          Mary Solbrig

          I am confused why you are using "INDEX()=1" and "INDEX()=2" as your arguments. This would mean that the variables a and b in the code are vectors of true/false values.

           

          From the question, my guess is that you would like a to be the vector of Number of Students from School A and b the vector of students from school B, so that with the example data a = c(100, 30, 200) and b=(200,25,400). Is this correct?

           

          If so, then I would suggest creating new fields to return counts for school A and school B. I've attached a workbook that uses your example data to return a p.value of .005762.

           

          If this isn't correct, could you provide an example of the code you are running in R for comparison?

          1 of 1 people found this helpful