6 Replies Latest reply on Jan 26, 2017 1:21 PM by Jonathan Drummey

    Trouble segmenting survey data by demographics (workbook attached)

    Katie Poznanski-Ring

      Hello All,


      I'm trying to build a highlight table for my survey data that segments a question's results based on demographic fields, like the example picture below.

      Sheet 51.png


      The question is a Likert-style rating for a handful of items, with ratings being "1=Definitely Influences", 2="Somewhat Influences", and "3=No Influence". I want to show this highlight table only for the "1-Definitely" responses. I'd like for the Grand Total column to be all the "Definitely" responses, and for that number to be segmented into the demos. I'm having trouble doing this; I think it requires a Level of Detail calculation that I don't yet know how to write.


      I am using version 10.1.3. Thank you for your help!



        • 1. Re: Trouble segmenting survey data by demographics (workbook attached)
          Jonathan Drummey

          Hi Katie,


          A few clarifying questions so I understand what you are trying to do:


          1) The % of Total on COUNTD(RespID) is currently set up with Table (Across) as the compute using. That means that is addressing along the DemID and Dem Text Response dimensions and partitioning on the Question Items and Q17 Filter (which is effectively the Text Response/Scale field). Usually in Likert survey analysis the addressing would be on the Q17 Filter (or Text Response/Scale field) such as in the "How to segment correctly?" view, so is the present % of Total calculation what you really want? In other words, should the % of Total be the % of total "Definitely" responses for each combination of Question/Dem Text Response/Dem ID?

          2)  Do you want the Grand Total to be the raw number of responses or the % of "Definitely" responses?


          I made some guesses and prepared a view that assumes that your answer to #1 is that you do want the % of total to be the % of total "Definitely" responses and the Grand Total to be the overall %, if this isn't correct let me know!


          Screen Shot 2017-01-25 at 10.30.58 AM.png


          This view requires no table calculations or Level of Detail expressions because it uses an alternative route to generate the % of total. The quickest method to a % of total is the Quick Table Calculation that you used, however as you found out it can start getting tricky dealing with filtering and grand totals. How the table calculation works is that it's taking the marks in the view that are aggregated with COUNTD(RespID) (where the Q17 Filter that is the Text Response/Scale field is in the viz level of detail) and then performing a further aggregation on them to get the total, and then the table calculation filter gets rid of the un-needed Q17 Filter values.


          To review (I'm partially writing this out because I'm trying to come up with a better explanation for this kind of problem in the trainings that I do). So the vizLOD is at the level of detail with DemID, Dem Text Response, Question Items, *and* the Q17 Filter, the first level of calculation is to aggregate COUNTD(RespID) to the vizLOD, the second level of calculation is the % of Total quick table calc, and then there's the Hide Q17 table calculation filter to deal with the increased vizLOD.


          While we could come up with something to deal with all of that, there's a simpler alternative where instead of using finer-grained vizLOD, an aggregate and a table calculation and then filtering those results we use a coarser vizLOD, a record-level calculation and an aggregation, and we embed the filter inside the record-level field.


          Where it starts is that all we want to do is count the "top box" responses of Definitely. So we can return those in a record level calculation:


          IF [Number Response] = 1 THEN [RespID] END


          There's an implicit "ELSE Null" in this formula so any response that isn't 1 aka "Definitely" will be assigned a value of Null. So this calculation is only returning the RespIDs who answered Definitely and only those will be counted in the view when we add the COUNTD() aggregation. It is a record-level field so it would be materialized in the data extract and be faster to count. If we'd wrapped this in a COUNTD() then Tableau wouldn't be able to materialize it.


          Then the Top Box % field is:


          COUNTD([Top Box Count])/COUNTD([RespID])


          In the view the Q17 Filter dimension is no longer in the vizLOD, so the vizLOD is just Question Items, DemID, and Dem Text Response. In the detail cells the Top Box Count and Top Box % fields are aggregated to the vizLOD and return the % that answered Definitely for each Question & Demographic. In the row Grand Total the demographic dimensions are removed from the grand total LOD aka gtLOD and the aggregation is computed across all demographics for each question:


          Screen Shot 2017-01-25 at 10.50.51 AM.png


          I'm always trying to answer the following "Where was my (or someone else's) mental model such that they made a decision that led them down a path that didn't go well?" In this case that decision point was in assuming that the Q17 Filter (the actual response to the question) was necessary on the vizLOD to get your results. The feedback that Tableau was giving about this being the problem is really blunt, it's the frustration that the extra gyrations of table calculations and table calculation filters are necessary and grand totals don't work. So your intuition about using a Level of Detail expression was coming from the right place because you were recognizing working at different levels of detail in the view was the core issue, unfortunately Tableau wasn't giving you the direction you needed to get to your goal. (This is all assuming that I was right in my assumptions above in preparing this view).


          FYI: for working with survey data in Tableau I heartily recommend reading Steve Wexler's amazing set of posts and presentations at Visualizing Survey Data – Data Revelations, he's also got a survey training that goes step by step through the materials and he's a great teacher!



          1 of 1 people found this helpful
          • 2. Re: Trouble segmenting survey data by demographics (workbook attached)
            Katie Poznanski-Ring



            Many, many thanks for your quick and wonderfully detailed reply!


            Your assumptions were right-on. That is exactly the view I was going for!

            1) I set the % of Total on COUNTD(RespID) to compute across because that was the only way I was able to get the correct raw numbers that I first tested in Excel (i.e. I knew by looking at my raw Excel file that there were 92 “Definitely” responses for the “scheduled time of events” item, for example). I have to confess I’m really new at this, so sometimes when all else fails, I resort to clicking around to see what happens; in this case, I got a kernel of truth to what I was going for with the raw numbers, then had trouble manipulating the view to focus the segmentation only on the % “Definitely” responses. So to answer your question, yes, you assumed correctly: the % of Total should be the % of Total “Definitely” responses.

            2) Also yes here: I want the Grand Total to be the % of “Definitely” responses in order to show the demographic breakout for those responding “Definitely”.


            Your explanations were so helpful – there’s a lot of “behind the viz” processes going on that are hard to fully understand as a newbie, so thank you for going through how table calcs work and the aggregation details! I was particularly second-guessing myself because since my data is in a tall format, I’m afraid of over-counting the results and thought that getting down to the record-level would’ve produced over-inflated results. I can’t believe I didn’t think of this as a “top box” problem, given my market research background, but this approach makes so much sense!


            I’m glad to know my head was in the right place. I’ve been trying to learn LOD calcs – the most common ones and how to recognize appropriate use cases – and it’s been one of the tougher things to learn as a beginner, despite my knowledge of Excel formulas.


            The Tableau Community is such an invaluable resource for the learning process, and I greatly appreciate folks like you who take the time to help get to solutions! I share your sentiment about Steve – his posts and conference presentations have been essential to my learning how to analyze survey data in Tableau!


            Thank you again!


            • 3. Re: Trouble segmenting survey data by demographics (workbook attached)
              Katie Poznanski-Ring

              Jonathan Drummey,


              I hate to bug you about this again, but I think I spoke too soon about the problem being totally resolved. When I drag CNTD(Top Box Count) to the Label card to get the raw numbers, I see that, for the First Year sub-demographic of the overall Class Level demographic, 19/92 should be 21% instead of 76%. In both circumstances, whether we have CNTD(Top Box Count) or AGG(Top Box %) on the Label, the Grand Total column is working perfectly as a count or percent. But it looks like the segments aren't working properly. Am I missing something?


                 Raw Numbers

              raw numbers.png


                 Original Solution with % of Total

              percent of total.png

              • 4. Re: Trouble segmenting survey data by demographics (workbook attached)
                Jonathan Drummey

                Hi Katie,


                I'm glad this has been helpful so far!


                Reading through your messages I'm confused because the answer you are asking for is different than answer I'd built.


                If we use the 92 denominator that then the answer of 21% (19/92) is essentially stating "For the Scheduled Time of Arrival question 19 respondents answered Definitely for the Class Level First Year out of *all* 92 respondents answering Definitely". In that case the Grand Total will always be 100% because it's a calculation "(numerator) there are 92 respondents who answered Definitely across all demographics out of all 92 respondents who answered Definitely". Here's the "version 1" view showing this:


                Screen Shot 2017-01-26 at 1.11.06 PM.png


                In other words, this view is only counting respondents who answered definitely and skipping all other responses, so that's why we can filter for just the Definitely responses and ignore all the rest as in the view above.


                Whereas what I'm used to (and what I'd built in my original workbook) with the use of top box scores and demographic dimensions is the 68% (19/28) that is essentially saying "For the Scheduled Time of Arrival question 21 respondents in Class Level First Year responded answered with Definitely out of 28 total respondents." And then the Grand Total is "92 respondents answered definitely out of 116 total respondents."


                Screen Shot 2017-01-26 at 12.52.53 PM.png


                With that clarification do either of these work for you, or are there other subtleties to the question that I'm missing?



                • 5. Re: Trouble segmenting survey data by demographics (workbook attached)
                  Katie Poznanski-Ring

                  OH, ok. I see now how these are two different questions. So if I'm correctly understanding you, what I was proposing (Version 1) looks more at the concentration of "Definitely" respondents within each demographic category (e.g. comparing First Years to other Class Levels; comparing Homeowners to other Current Residence types). What you are proposing (Version 2) is more like normalizing across all demographic categories, e.g. First Years to all other demographics. Version 2 allows us to make broader comparisons across categories, while Version 1 allows us to make within-demographic comparisons. Great! Version 2 answers the question. Thanks so much, Jonathan!

                  • 6. Re: Trouble segmenting survey data by demographics (workbook attached)
                    Jonathan Drummey

                    You're welcome!


                    FYI version 1 is slightly different than what you describe because with a Table (Across) compute using for the denominator it's counting across *all* demographic categories, not just Class Level/Current Residentce.


                    Here's an example: With the Table (Across) compute using  there are 41 respondents for "Unable to participate due to working off campus" question. If you change the compute using on the % of total calculation to just Dem Text Response so it partitions on DemID and Question Items then that changes the results for the "Current Residence" DemID because there are only 40 respondents who answered Definitely in that DemID. So to more accurately match what you describe for version 1 of "e.g. comparing First Years to other Class Levels" you'd need to use the more specific Compute Using.