1 Reply Latest reply on Nov 11, 2014 11:09 AM by Mary Solbrig

# Chi square reading first two lines of table - R Integration

Hi,

I am trying to get a p-value from a chi square test between two groups of schools. I am looking at demographic makeup, for which each cell tells me the number of students in that school. My data looks something like this:

SchoolName    RaceIndicator   Number_of_Students

SchoolA          Hispanic          100

SchoolA          AfricanAmer     30

SchoolA          White                200

SchoolB          Hispanic          200

SchoolB         AfricanAmer     25

SchoolB          White               400

I have used tips from other Tableau community threads on the topic of Chi-square testing but they are set up to count frequencies, whereas my data would need to be weighted.

Since I can produce the tables I want in Tableau, I thought I could do an easy work-around by just outputting the first two rows of data into R.

This is my script:

SCRIPT_REAL("
a <- .arg1
b <- .arg2

data <- rbind(a,b)

chisq.test(data)\$p.value
"
,
INDEX()=1, INDEX()=2)

I also tried creating a "RowNumber" calculated field and input that for .arg1 and .arg2. Both do not seem to work and consistently spit out "0.317".

This is what my dashboard looks like currently:

I know from manually putting this data into R that the p-value is wrong.

Please let me know where I erred!

• ###### 1. Re: Chi square reading first two lines of table - R Integration

I am confused why you are using "INDEX()=1" and "INDEX()=2" as your arguments. This would mean that the variables a and b in the code are vectors of true/false values.

From the question, my guess is that you would like a to be the vector of Number of Students from School A and b the vector of students from school B, so that with the example data a = c(100, 30, 200) and b=(200,25,400). Is this correct?

If so, then I would suggest creating new fields to return counts for school A and school B. I've attached a workbook that uses your example data to return a p.value of .005762.

If this isn't correct, could you provide an example of the code you are running in R for comparison?

1 of 1 people found this helpful