2 Replies Latest reply on Feb 16, 2012 11:43 AM by Jonathan Drummey

    Creating a Table Showing Intersection

      Help! I have 50 yes/no variables, representing the answers to 50 membership questions (are you a member of a, b, c, d, etc.)

       

      I want to create a 50x50 table that shows the intersection of these answers (which ultimately I would likely turn into a heatmap but for now am just trying to get the table). Instead of getting the intersection, I'm getting a table that tries to calculate all of the possible permutations (yes/no for each, embedded).

       

      To give a 2 x 2 example, I want:

       

                     Group1          Group2

      Group 1     20                    4

      Group 2     7                    35

       

      The above would show that there were 20 members of Group 1 of which 7 who were members of Group 1 were also members of Group 2,  and there were 35 members of group 2 of which 4 were also members of group 1.

       

      What I am getting is

       

                                             Group1               Group2

                                              Yes  No               Yes  No

      Group 1  yes Group2 Yes

                          Group2 No

                     no Group2 Yes

                          Group2 No

      Group 2  yes Group1 Yes

                          Group1 No

                          Group2 Yes

                          Group2 No

       

      You can imagine what a mess this becomes as it tries to embed 50 variables (it crashes each time but also gets impossible to read with even a few variables, well before you get to 50).

       

      Any ideas or help would be appreciated.

       

      Susan

        • 1. Re: Creating a Table Showing Intersection
          Jonathan Drummey

          Hi Susan,

           

          I've been working on this and I believe the first problem that you're having is that each answer is its own dimension (blue pill). Tableau creates a new set of headers for each dimension on the Rows or Columns shelves, so that's why you're getting the nests. What you'll need to do is reshape your data so there is a row for every combination of answers. See this KB article for details on reshaping:

           

          http://kb.tableausoftware.com/articles/knowledgebase/preparing-excel-files-analysis

           

          The second problem is the calculation of how many of X were in Y. I'm part of the way there with some table calculations, I just haven't quite figured out how to get them to appropriately aggregate. I have to do survey analysis a few times per year and knowing how to do this kind of analysis would be helpful for me, so it's a fun problem!

           

          Jonathan

          • 2. Re: Creating a Table Showing Intersection
            Jonathan Drummey

            Hi Susan, this took me a while longer than I'd expected because I was overthinking the problem. I started with some messy table calculations, simplified those, then simplified those further, and ultimately realized I didn't need them at all. It's been a good learning experience for me, and I took the time to write up the details so I don't forget and have a post to refer other people to.

             

            My goal was to create a view that for a given question X, show how many people answered yes (1) to that question who had also answered yes to question Y. I wanted the view to work with an arbitrary number of survey dimensions and support division by other categorical information. I also wanted the view to support arbitrary text for question labels (i.e. not require any specific numbering or ordering) and allow the possibility of alternative calculations of what constitutes a "yes" (for example, instead of yes = 1 we might be using a Likert scale and want to compare how many people had answered Very Good or Excellent to other values).

             

            Here's the sample data layout I used:                

            qmatrix1.png

             

            For the sample data above, the results for the count of people who answered yes to question X who'd also answered yes to question Y are:

             

            New Picture.png

             

            Starting with the first row, where 2 people answered yes to question 1, none of them answered yes to Question 2, those same 2 answered yes to Question 3, and 1 of those 2 answered yes to Question 4.

             

             

            And for the %, something like this:

             

            New Picture (1).png

             

            If I were do this in Excel, I'd be using some complex combination of calculations and separate worksheets, and the whole contraption would likely break as soon as we wanted to add another question or dimension. In Tableau, this can be done with one calculation. However, in order to do this in Tableau, we need to reshape the data so that the all the questions become members of the same dimension, instead of different dimensions, so Tableau can be able to "see" all the different combinations for each person at once. This is known as taking a "wide" data set and making it a "tall" one. However, for analyzing survey data, a "tall and wide" data set is usually the most useful, so that's what we'll do here.

             

             

            Tableau offers some useful suggestions for reshaping data and provides and Add-in for Excel:

             

            http://kb.tableausoftware.com/articles/knowledgebase/preparing-excel-files-analysis

            http://kb.tableausoftware.com/articles/knowledgebase/addin-reshaping-data-excel

             

             

            Using that on the sample data, here's what we end up with:

            New Picture (2).png

             

             

            We've left all the original Person ID/Question 1/Question 2/etc. values here, so calculations can be performed on them. Instead of there being one row for each person, there are now 4, one for each of the 4 questions. The Question & Answer columns preserve the original question/answer combinations.

             

             

            Now, it might seem like we could load this up in Tableau, put "Question" on the Row shelf, and "Question" again on the Columns shelf, with the SUM(Answer) on Text shelf, and start seeing our results. However, if we do so what what we get is this:

             

            New Picture (3).png

             

            This is something took me a little while to get my head wrapped around when I started using Tableau – as much as Tableau lets us drag and drop in delight, Tableau doesn't just "make up" data. The underlying data still only has one answer per particular question per person, and that's exactly what is being displayed in the view.

             

             

            We want to compare results for every question/person to every other question for that person. There are a couple ways to do that in Tableau, one with table calculations that can get somewhat difficult (but have higher performance when there are hundreds of questions and thousands of people, see this thread for details: http://community.tableau.com/message/137606), another way is to do a self-join to create a cross product on the data where every person/question combination is created.

             

             

            To start that do that, I started Tableau, opened up the Excel workbook, and after picking the page clicked on the Custom SQL radio button:

             

            New Picture (4).png

             

             

            And edited the Custom SQL to look like this:

             

            SELECT [Reshaped$].[Answer] AS [Answer],

              [Reshaped$].[Favorite Fruit] AS [Favorite Fruit],

              [Reshaped$].[Person ID] AS [Person ID],

              [Reshaped$].[Question] AS [Question],

              [Reshaped$].[Question 1] AS [Question 1],

              [Reshaped$].[Question 2] AS [Question 2],

              [Reshaped$].[Question 3] AS [Question 3],

              [Reshaped$].[Question 4] AS [Question 4],

              [R2].[Question] AS [Question Orig],

              [R2].[Answer] AS [Answer Orig]

            FROM [Reshaped$] INNER JOIN [Reshaped$] AS [R2]

            ON [Reshaped$].[Person ID] = [R2].[Person ID]

             

            I added two additional columns, one called Question Orig and the other Answer Orig. This now gives us every combination of person and question, and instead of the 12 rows in the original Reshaped data, there are 48. You can see the data in the Reshaped Self-Join Crosstab view in the attached workbook.

             

            Now in Tableau, we can set up the view:

            1. Put Question Orig on the Rows shelf. This will be our Y for the "how many people answered X who had answered yes to Y" question.

            2. Put Question on the Columns shelf. This will be the X

            3. And create the calculation:

             

            SUM(IF [Answer] = 1 AND [Answer Orig] = 1 THEN 1 ELSE 0 END)

             

            Drag this onto the Text shelf and you'll see this, which matches the expected results above:

             

            New Picture (5).png

             

            The calculation works by checking the values of Answer and Answer Orig  for each Question/Question Orig combination. If someone answered Yes to the given Question, and also Yes to the given Question Orig, then that will count towards the total. In the sample data, there are 3 answers for each cell in the table, one for each person. If we were using a Likert scale, it would be simple to change that statement to use some other criteria, for example to compare all Excellent and Very Good answers to each other.

             

             

            To make this a little more legible and separate the original answers from the comparison, I created another calculated field:

             

            IF ATTR([Question Orig]) = ATTR([Question]) THEN "Original Answer" ELSE "# Who Also Answered Original" END

             

            And put that on the Color shelf:

            New Picture (6).png

             

            That way we can see what the original answers were that all other marks are in comparison to.

             

             

            The last step is to do the % of total calculation. Using the built-in quick table calculations won't work because the denominator is constantly changing with each new row. Therefore, the calculation looks like this:

             

             

            ZN(SUM(IF [Answer] = 0 THEN 0 ELSE [Answer Orig] END)/SUM([Answer Orig]))

             

             

            The ZN() function ensures that Tableau will write a 0% instead of leaving blank columns. The results look like this:

             

            New Picture (7).png

             

            Finally, these calculations will still all work when we use other categorical information to subgroup the data. For example, The Favorite Fruit dimension can be dragged onto the Rows or Columns shelf and the results will look like so:

            New Picture (8).png

             

            Example workbook is attached. Let me know if this works for you.

             

            Cheers,

             

            Jonathan