12 Replies Latest reply on Feb 18, 2020 9:08 AM by Jonathan Drummey

    Survey data with Multi-Select question

    Navneet Sohal

      Hello Everyone,

       

      I have a large survey data which has got many many multi-select questions.

      Do i need to pivot each multi-select question?

      I want that each multi-select choice should get correlated to other questions.

      Attaching excel file for reference.

      If you have better advice for me with regard to data re-structuring and data visualization, I will be happy to implement that.

       

      I tried to pivot only 3 multi select questions in the below manner and Tableau-Prep gave me a system error(Query text limit exceeded).

       

       

      Then I tried duplicating the survey flow and then joining the pivot. This worked as i desired; but it generated huge data(number of rows).

      In the above flow i could not add any more multi select question as again I got a system error.

        • 1. Re: Survey data with Multi-Select question
          Joshua Milligan

          Hi Navneet,

           

          I just built out a flow that restructured survey data for analysis and I used a lot of pivots for multi-select questions.  The key is to have a unique identifier for an individual response so you can do a distinct count later on to know how many responses for that answer you had.  This is important because the granularity of your data will change from one record per respondent to one record per unique combination of responses.  In my case, I had a timestamp down to millisecond which worked great.  In your sample data, you have a ResponseId.

           

          I also notice in your sample data a grouping of "heard", "used", and "consider", so I'm thinking three pivots, each like this:

           

          There will be some extra records (the question title, NULLs) that I will eventually get rid of, but I'll save them for now because it makes it easier to identify each pivot.

          In the end, I'll just exclude all the NULL and "Consider", "Heard", "Used" fields:

           

           

           

          Then, in Tableau, I'll make sure not to use Number of Records for anything as it is meaningless.  Instead, I'll use COUNTD(ResponseId) to count the responses:

           

           

           

          I've attached a packaged flow for you to examine details.

           

          Hope it helps!

          Joshua

          • 2. Re: Survey data with Multi-Select question
            Joshua Milligan

            And in looking back at your original post, I see that you did indeed go down the same path but hit a "query text exceeded..."  I ran into similar (not the same error) issues when I built out the flow with my data (your sample worked fine - but it was small and not very complex).  In the end I had to create an output after each / every other Pivot and then bring that back in as an input for the next.  It was a bit tedious but gave me a great data set in the end.

             

            Best Regards,

            Joshua

            • 3. Re: Survey data with Multi-Select question
              Jonathan Drummey

              There's usually more than one way to go about this, here's my contribution. All the Columns to Rows pivots are replaced by a single pivot whose results are joined on a table that maps questions to groupings as well as text, then we can use that as is in a Tableau view:

               

               

              Screen Shot 2020-02-12 at 12.47.36 PM.png

               

               

              Screen Shot 2020-02-12 at 12.47.09 PM.png

               

              Here are some comparisons to what Joshua built:

               

              - This method should avoid the "query text exceeded" errors in large flows because there's a single pivot vs. multiple pivots.

              - If we look at the data there are 8 respondents * 3 groupings * 5 answers in each group so that's 120 records. The source that Joshua created ends up with 189 rows due to the nested pivots, the one I created ends up with 120 records, which is a 1/3 reduction in the number of records.

              - Also with this method since the grain of the data is the same as the grain for analysis (one record per respondent & answer combination) then the # of Responses measure can be COUNT([Answer Value]) which will be significantly faster than a COUNTD() in large data sets, and a % of total can be COUNT([Answer Value])/SUM([Number of Records]).

              - The method that Joshua uses results in a separate dimension for each grouping whereas what I'm proposing has a single dimension. So the view Joshua shared where the the Used and Consider are put on opposing Shelves (Rows and Columns) would take some extra calculations in Prep or Desktop to build out.

               

              Jonathan

              1 of 1 people found this helpful
              • 4. Re: Survey data with Multi-Select question
                Joshua Milligan

                Jonathan,

                 

                I really like this approach!

                 

                I was considering whether the NULL values should be kept in your version and originally thought it would be best to exclude them in order to

                1. reduce the size of the resulting data set further
                2. (of higher importance) make filter actions more intuitive.  For example, if I were to drill down to individual responses from your view above, I'd have to ensure I'd additionally filtered out NULL answer values in the detail view as each intersection of Grouping and Answer Text includes all responses.

                 

                However, it then occurred to me that if one of the values was never selected by anyone across the entire survey, then the very existence of that possible question/answer would not be known from the resulting data.  That alone makes me very much reconsider the idea of excluding the NULLs.  (though the full domain does exist in the data source your joining in, so there might be possibilities to retain it)

                 

                Are there any other considerations?  Am I missing anything?

                 

                Best Regards,

                Joshua

                • 5. Re: Survey data with Multi-Select question
                  Jonathan Drummey

                  Hi Joshua,

                   

                  Great question! I think keeping the Nulls is necessary, here's me working out my thinking on this:

                   

                  a) My starting point is "what is the fundamental unit of analysis"? For survey data that's typically the combination of respondent and their response to each question (to start, more on that below). The reason why I ask this is that when we can get the grain of the data (i.e. the dimension(s) that define a record) to match the unit of analysis then our Tableau visualizations are most often easier to build & maintain. We shorthand this to "Tableau likes tall data".

                   

                  b) Therefore instead of respondents on rows and questions on columns we're going to want a record for every combination of respondent/question/response. In this data there are 8 responses * 18 questions so that would theoretically be 144 records.

                   

                  c) There's a caveat to this where some questions are really respondent-level demographic information like age, gender, etc. that we might leave out of the pivot that generates the record for each respondent/question/response. In this case there are 3 demographic questions so that's 8 * (18-3) = 120 records.

                   

                  d) In analyzing survey data "no answer"/"did not answer"/"chose not to answer" are vital components to analysis and depending on how the survey was administered that ends up being encoded as a value, or, in this case, null.

                   

                  e) Multiple answer questions are a complicating factor...I think it helps to consider the two categories of multiple answer questions:

                  1) The first is open-ended where the user is arbitrarily entering a set of answers e.g. "List your favorite ice cream flavor(s):" Our natural tendency is to think of multiple answer questions as a single respondent/question combination per a) above, but really there can be multiple (0 to N) responses to each question. Therefore per b) above instead of one row per respondent/question we're going to need multiple rows per respondent/question, i.e. one row per respondent/question/response, and per d) above if they didn't respond at all that also has analytical meaning that we'd want to keep.

                  2) The second category is the use case in this thread where there are five listed companies to choose from, thus in effect creating five separate questions where per b) we'd want a row per respondent/question/response and per d) we'd want to keep the nulls.

                   

                  In my own mental model I tend to think of multiple answer questions as more of a "grouping" of questions so for me there's a hierarchy of respondent->[optional multiple answer question->question or sub-question of multiple answer question->response. Hat tip to Steve Wexler, without all his survey data posts at http://datarevelations.com I wouldn't be able to formulate that.

                   

                  Therefore for this hierarchical mental model it might be [respondent]->"list your favorite ice cream flavor(s)"-> (with an implied) "do you like vanilla?" "Do you like chocolate" "Do you like mango sorbet" etc. -> response. Or in this use case it's [respondent] -> Heard -> (with an explicit) "Heard of Amazon" "Heard of Google" etc. -> response. With regards to this data set I think it added a potential bit of confusion because the answers had the company names & nulls instead of something like yes/no or 1/0 that would make the question/answer differentiation more visible. Also in the output of the flow I built there's [ResponseID] -> [Multiple Answer Grouping] -> [Answer Short Text] -> [Answer Value] that are matching that hierarchy, if I redid it now I'd probably use some different naming to make that hierarchy more clear.

                   

                  g) Last step...there are a few ways we might analyze the data:

                  1) One business question might be "What did respondent X answer?" and in that case we might want to see user X chose Amazon, Google, IBM. Whether we filter out the nulls or not doesn't matter for answering this question.

                  2) However another business question going back to d) above might be "What did respondent X answer for each of the five possible answers?" and in that case we most likely want to see all five possible with some sort of marking or indicator for the ones they chose. If we filter out the nulls in advance then we can't be sure we would be able to create that display, even with Show Empty Rows/Columns. Whereas if we leave the nulls in (i.e. have the data at the grain of respondent/question/response) then we can always answer that question.

                  3) And then, finally, we get to the aggregate responses where we really want to have a row or column or color for every possible answer and as you pointed out even if nobody gave a possible answer that we still want to show that. And given that someone might filter the data by gender, age, etc. then the data needs be padded out for the domain because Show Empty Rows/Columns and other data densification techniques only work in limited circumstances. A little plug - in my TC19 talk on data densification & padding https://tc19.tableau.com/learn/sessions/zen-master-you-did-what-your-data-deep-dive-imputing-and-densifying-data  this was one of my first examples.

                   

                  Thanks again for asking this question, this thread was really helpful for me to articulate my reasoning!

                   

                  Jonathan

                  • 6. Re: Survey data with Multi-Select question
                    Navneet Sohal

                    Hello Jonathan Drummey and Joshua Milligan

                    I am great full to both of you to address my query in such a detailed manner.

                    The explanation that you provided was also of great help.

                    Jonathan Drummey will surely look at your TC19 video.

                     

                    Jonathan Drummey correct me if I am wrong; by looking at your tflx and twbx files, with your approach we will not be able able to figure out; how many of the respondents who have heard of Microsoft have also Used(Microsoft/Amazon/IBM....) and will Consider(Microsoft/Amazon/IBM....).

                     

                    After creating numerous flows I could only find that to answer my above question we need to prepare separate pivot for each multi select question and then join them. This has a limitation that with my survey data(~500 responses) in Tableau prep I could only add upto 4 pivots. When I add the 5th pivot(question) the processing takes 2 hrs and at the end I get an error message.

                     

                    I dont know why, but now I am not able to attach attach files here. How can i do that? I want upload my sample twbx and tflx files.

                    I see that my files are attached to my original post.

                    • 7. Re: Survey data with Multi-Select question
                      Jonathan Drummey

                      Hi,

                       

                      I’m thinking the cross question analysis is do-able using a variation on what I built (in fact I’d built a version of it before going with the data in the “tall” format), but just to be sure it works for you I have a few questions:

                       

                      1) How many multiple response questions do you want to compare e.g. is it just a few or is all of them? (I’m guessing all of them based on your comment about wanting to pivot for each).

                       

                      2) How many do you want to compare at a time in a single view? For example are you going to want to compare 2 at a time (Heard to Consider, Heard to Used, etc.), 3 at a time, any arbitrary set at time?

                       

                      3) In that comparison are you going to want a static view showing all the comparisons like the highlight table that Joshua built or something dynamic where a user could click on Heard/Microsoft and see what that cohort has Considered/Used (this is known as proportional brushing), or both?

                       

                      Jonathan

                      • 8. Re: Survey data with Multi-Select question
                        Navneet Sohal

                        Hi,

                         

                        After reading your questions I again sat with my survey data to closely analyze it.

                        Realized that I was over thinking and over doing things.

                        Comparing all the multi-select questions with each other will not be a good idea.

                         

                        There are in total 10 multi-select questions in my survey (each approximately have 10 selections). And 30 single select questions.

                         

                        The are Two sets of multi-select questions for which I want to do proportional brushing (thanks for that word).

                         

                        First set have three multi-select questions {Heard-of / Used / Consider in future} questions.

                        Second set have two multi-select questions.

                        Even if both the sets do not compare with each other it will be fine.

                         

                        I am attaching a new sample data excel file to give you a better overview of my survey data (14022020_Sample Data).

                         

                        I hope I was able to put my thoughts properly.

                         

                        Appreciate your support.

                         

                        THANKS !!

                        • 9. Re: Survey data with Multi-Select question
                          Navneet Sohal

                          While trying to edit my original post to add new sample data, all the replies to my original post were getting deleted. May be I am doing something wrong here.

                          Hence adding a google drive shareable link

                          14022020_Sample Data.xlsx - Google Drive

                          Hope that is allowed here.

                          • 10. Re: Survey data with Multi-Select question
                            Jonathan Drummey

                            Hi,

                             

                            I’ll have a chance to look at this later today, I’ve got a couple of ideas.

                             

                            Jonathan

                            • 12. Re: Survey data with Multi-Select question
                              Jonathan Drummey

                              Here's a next version using the Google Sheets data. The changes to the Prep flow were adding to the question grouping list for the new questions and renaming some fields.

                               

                              In Tableau Desktop I did the following:

                               

                              - duplicated the prior bar chart worksheet three times, once for origin & two target sheets

                              - created a Grouping + Answer calculated field to combine the question grouping & answer text. (this could have been done in Prep)

                              - added the Grouping + Answer field to Detail on the origin worksheet

                              - created a Grouping + Answer Set based on the Grouping + Answer field. This is the set that is going to be the target of a Set Action.

                              - created a Respondent has Grouping + Answer Set using a conditional calculation with the formula MAX([Grouping + Answer Set] AND NOT ISNULL([Answer Value])). This identifies at the respondent level whether the respondent answered affirmatively to any of the questions

                              - On the target worksheet added Respondent has Grouping + Answer Set to Color

                              - Created a % of total table calculation for the label on the first target worksheet with a compute using on the set.

                              - created a level of detail expression & calculation to get the % of selected for the 2nd target sheet.

                              - put the Respondent has Grouping + Answer Set on Filters on the 2nd target sheet

                              - Added an ad hoc group for the question grouping and added that as a scoped filter to the origin & target worksheets

                              - added a dashboard with the three worksheets

                              - added a set action that targets the Grouping + Answer Set.

                               

                              Here's the dashboard with Consider/IBM selected. The selection updates the Grouping + Answer Set which is then used within the Respondent has Grouping + Answer Set which then creates the stacked bars on the 1st target sheet and filters the 2nd target sheet.

                               

                              Screen Shot 2020-02-18 at 11.53.14 AM.png

                               

                              v2019.4 flow & workbook are attached.

                               

                              If you really want a highlight table view that does the cross compare I've got some thoughts on that how to build that whilst minimizing the pivots in Prep.