There are a variety of ways to go about this using custom views or queries in your data, Custom SQL, table calcs, or Conditional Filters, or Sets. I created a Set solution based on a Conditional Filter in the attached 8.1 workbook.
The basic calculation is a row-level calc called "Yes on 1 Boolean" with the following formula: LEFT([Question],2)=="Q1" And [Answer Y/N] == "Y", you can see this is accurate in the "data" worksheet. I then used this formula in the Yes on 1 Set, which is based on the Subj # dimension and has a Condition by Formula with: MIN(IF LEFT([Question],2)=="Q1" And [Answer Y/N] == "Y" THEN 1 END) == 1
What this does is apply the calc to each row and if it's True then return 1, else Null. Conditional filters require an aggregate, so we check whether the minimum value is 1 (at the Subj # dimension level), if so then return True, and Tableau will then return all those Subj #.
How this works is that Tableau computes the Conditional Filter as a subquery, so it's independent of other filters in the view. You can then create a view with the Set on the Filters Shelf to only return those Subj # matching the set criteria, along with filtering for specific questions if you'd like. You could skip the Set and just use the Conditional Filter, the advantages of having a Set are:
- reuse, since you can quickly use the same Set in multiple views, as well as duplicate the Set and edit it for slightly different conditions
- have access to the IN/OUT of the Set & combined Sets functionality in 8.0 and above
- easier interpretation of views, it's a lot faster & easier to understand a specific set name than to see a filter on Subj # and then have to open up the Filter dialog to understand the settings.
inter question analysis.twbx 47.4 KB
Thank you very much for your post. Although your set formula returns TRUE for Q1, it returns FALSE for the remaining Questions. I may have not done well to explain what I am trying to do and for that I apologize as your post was very informative. Basically I am trying to see, of only those that have had an ankle injury, what elements of Q2 do they answer Y to as well, then count those responses in the sheet. Since the raw data has them both in the same column I am having trouble pulling Q1 out and filtering across Q2 Results. I would then need to count the trues as they pertain to the Q. (i.e. 2 out of 3 ankle injuries were treated with medication. Since Q2 is a (check all that apply) I can view this by count[ID] however, when I pull out Q1 let say in a set or a calculation, the current view disappears. TO apply this to your workbook, I think I am looking for something like the following. The end result would be a dashboard with Q2 results visualized with a Q1 filter (multiple body parts).
Subject # Question Answer
Yes on 1 Boolean
(Have had an ankle injury)
Q1. Have you had an ankle injury? Y 1 Q2. Was it managed with medication? Y TRUE 1 Q3. Was it managed with rehab? Y TRUE 1 Q4. Was it managed with Surgery? N FALSE 2 Q1. Have you had an ankle injury? Y TRUE 2 Q2. Was it managed with medication? Y TRUE 2 Q3. Was it managed with rehab? Y TRUE 2 Q4. Was it managed with Surgery? Y TRUE 3 Q1. Have you had an ankle injury? Y TRUE 3 Q2. Was it managed with medication? N FALSE 3 Q3. Was it managed with rehab? Y TRUE 3 Q4. Was it managed with Surgery? Y TRUE 4 Q1. Have you had an ankle injury? N FALSE 4 Q2. Was it managed with medication? N FALSE 4 Q3. Was it managed with rehab? N FALSE 4 Q4. Was it managed with Surgery? N FALSE
Since both of these questions belong to the same dimension, can you not pull out Q1 and have it act as a Y/N filterr? Logically this seems like the easiest way. Thank you for effort on this. I am new to Tableau so this is really helpful for me to understand. I apologize if I led you astray with my initial post.
The workbook that I posted is doing what you are asking for, only the technique is a little indirect and I didn't write up a full explanation, so I'll try to do so here. Also, you changed the data tables for each, for this time around I revised to use your second set of data. In the future, please post a text file or Excel workbook with the data, that makes it easier for us to use the data directly.
First of all, to clear up confusion, the Yes on 1 Boolean is not the filter to use, it's only an intermediate calculation I used to show how I got to the Yes on 1 Set that is the filter to use.. I'll break down the steps more to help you follow the logic so you can duplicate it in your data.
The data is organized such that there is a row for each Subject/Question combination makes a unique row in the data source.
At one level, the goal is to count the # of Yes answers - specifically, the # of distinct Subjects who answered Yes - for each Question 2 answer all Subject who answered Yes on Question 1. Another way of writing this is that we want to filter for the Subjects who answered Yes on 1, so we can count their answers for Question 2 (and others). For this data set, that means we want to return Subject #s 1-3, and filter out Subject # 4.
The combination of the above two paragraphs means that we're going to have to look across rows in the data source. In Tableau, when we want to dynamically look across rows (in the data, in the query that Tableau issues to the data source, and/or in the view) to have the results of one row affect another, we have several options:
- A custom query, view, or Custom SQL. For example, you could do something like a self-join to add an extra column to the data with the Question 1 answer for each Subject.
- Do a self-data blend where the secondary source is filtered for Question 1.
- Table calculations where you have all the Subjects & Questions in the view, then have a calculation that returns the Subjects who answered Yes on Question 1 to all other Subjects and a filter based on that. I set up examples of these three options in a separate thread at http://community.tableau.com/thread/137236.
- Using a Top or Conditional Filter or a Set based on a Top or Conditional Filter, which is what I used and will explain in more detail.
The dimensions in the view determine the level of detail or granularity of the view, they are added to the GROUP BY clause of the SQL that Tableau issues to the data source. So when we have a view with Subject and Question in it, like this:
The query looks like this:
SELECT [Subject #], [Question] FROM [table] GROUP BY [Subject #], [Question]
When we add a General or Wildcard filter on a discrete (blue pill) dimension, or a range filter on a continuous (green pill) dimension, those criteria are added as a WHERE clause. So the SQL for that looks like this:
SELECT [Subject #], [Question] FROM [table] GROUP BY [Subject #], [Question] WHERE [Question] == "Q1. Have you had an ankle injury?"
And here's the view that doesn't show what we want:
The critical point here is that we don't actually want to filter on the Question, we want to filter on the Subject who had a specific answer to a specific Question. This is a common mistake users of Tableau (both newbieand long-time) make, the interface makes it so easy for us to directly manipulate the data that it's also too easy for us to get involved in the data not take the step back and look at what level of detail (granularity) that we need.
A way we can filter for all Subjects who answered Yes on Question 1 is via a Conditional Filter. Unlike Regular or Wildcard Filters, Top and Conditional filters generate a separate query (a subquery, actually) to the data source using the criteria of the filter that is at the level of detail of the dimension used for the filter. The results of that subquery are then inner joined to result set from the standard query leading to what is displayed in the view. (This is a bit simplified, there are other filters and operations that can take place).
The Yes on 1 Boolean calculation with the formula LEFT([Question],2)=="Q1" And [Answer Y/N] == "Y" returns True for each Q1 that has a Y answer. This calculation is a row-level calculation, that means it's computed for each row in the data source. If we use that as a filter, we'll get the same results as above for filtering for Q1 = Yes.
This formula is then wrapped inside an aggregate calculation, I set up a Yes on 1 Aggregate with the formula:
MIN(IF LEFT([Question],2)=="Q1" And [Answer Y/N] == "Y" THEN 1 END)==1
This aggregates the row-level calculation across whatever dimension(s) are being used. When Subject # is the only dimension in the view, this returns True for Subject #s 1-3 and Null for Subject #4:
The query that Tableau issues is something like:
SELECT [Subject #], MIN(IF LEFT([Question],2)=="Q1" And [Answer Y/N] == "Y" THEN 1 END)==1 AS [Yes on 1 Aggreagate] FROM [table] GROUP BY [Subject #]
Now I'm going to go on a slight tangent, to help you understand working at different levels of detail. If we also bring Question into this view as a dimension, this will only return True for Question 1, which won't help:
The reason why is that we've increased the level of detail of the view, so the query that Tableau issues is:
SELECT [Subject #], [Question], MIN(IF LEFT([Question],2)=="Q1" And [Answer Y/N] == "Y" THEN 1 END)==1 AS [Yes on 1 Aggreagate] FROM [table] GROUP BY [Subject #], [Question]
Because the Subject & Question together make a unique row in the data, the calculation is working at a row level and doesn't return the desired results.
Ok, back to the regular thread. As I said earlier, Top and Conditional filters are computed as a subquery at the level of detail of the dimension. So we can set up a Conditional Filter on Subject # using the Yes on 1 Aggregate calculation. Here's the filter:
And here's the view with just Subject as a dimension, we can see that only Subjects 1-3 are being returned:
Now, let's do the next step and add Question to the view:
This is still working as desired. Tableau is applying the conditional filter at the Subject # level of detail, then joining those results to the view with the Subject # and Question pills. Here's what Tableau is doing under the hood:
SELECT [Subject #], [Question] FROM [table] GROUP BY [Subject #], [Question] INNER JOIN (SELECT [Subject #] FROM [table] GROUP BY [Subject #] WHERE (MIN(IF LEFT([Question],2)=="Q1" And [Answer Y/N] == "Y" THEN 1 END)==1) == True)) AS [conditional filter] ON [table].[Subject #] == [conditional filter].[Subject #]
If we bring in the Yes on 1 Boolean or Yes on 1 Aggregate into this view with Subject # and Question, those will both look odd because they are being separately computed at the level of detail of the view, which is at individual row level.
However, we're still (correctly) only seeing Subjects 1-3, because the Conditional Filter on Subject is being applied as a separate subquery at the Subject level of detail, while the rest of the view is being computed at the level of detail of Subject #, Question, *and* Yes on 1 Boolean (that's a dimension as well). This is a subtle and powerful feature of Tableau, between Top & Conditional Filters, data blends, and table calculations, we can simultaneously work at different levels of detail in the same view.
In my original post, I'd taken that same formula for the Conditional filter and used it to create the Yes on 1 Set on Subject #:
One way to think of Sets is that they "materialize" a set of filter criteria so we can quickly re-use them in different worksheets. So we can use the Yes on 1 Set instead of the Subject # filter, as in this view:
If you're using a non-JET data source such as a Tableau data extract, then you can use the IN/OUT of a Set as a dimension and in calculated fields:
Also, with the non-JET data sources we can use COUNTD to get results, I set up a couple of calculated fields for counting the # of Yes answers a % of total:
Hopefully this step-by-step intro makes more sense!
PS: updated workbook is attached
inter question analysis v2.twbx 148.3 KB
WOW! I really appreciate you taking the time to explain this to me so thoroughly. It took me a full day to really understand your last post and apply it, however, it has really expanded my "rookie" knowledge base of Tableau as it applies to survey analysis. I just wanted to take the time to say Thanks.
I concur - thank you SO MUCH Jonathan Drummey for the detail on this. Unlike Kevin, I have yet to work through it and understand it but I am SO GRATEFUL that folks like you are so willing to take the time to explain things so thoroughly and participate so generously in this forum.
I am a relatively new Tableau user and use it mostly to analyze and visualize survey data. To that end, I have learned quite a bit yours and other posts in these forums as well as Steve Wexler's blog.
This particular topic is important to my work and I noticed it was written before Tableau 9. I wonder if the introduction of LOD calculations might change the way you do inter-question analysis?
In this case I can't see any significant gains by using LOD expressions. This post used computed Sets using conditions which return a boolean in/out value, the mechanism this kind of Set uses is the same as LOD expressions (subqueries at the level of detail of the Set/LOD that are joined back in to the main query). One way to conceive of LOD expressions (FIXED and EXCLUDE in particular) is that they can return multiple different values of any data type vs. a Set using a conditional expression can only return a boolean result.
Hope this helps!
This does help, thank you. I am intrigued by the concept of returning multiple different values, but first things first and I'm going to follow your guidelines for computed sets.