1 of 1 people found this helpful
Working with survey data makes a nice break from counting how many patients who have COPD have been readmitted in the last 6 months, here goes one attempt:
First, a little digression whose executive summary is "Please take the time to make a packaged workbook w/some sample data.": Without a packaged workbook nor sample data we don't know what your data looks like. Sometimes we can get a good clue based on the verbal description and screenshot, in your case the data hasn't been described in detail and the screenshot doesn't include any of the Shelves or the Marks Card so that makes the process more difficult. Based on my knowledge of survey data and your prior posts I can make an educated guess that your data has been structured so granularity is one record per respondent & question, with an additional dimension of the Segment, but that requires me to make a guess and many forum answerers don't have the background to make that guess, so they ignore the question and go on to answer questions that they can. The rate of answers to forum questions without packaged workbooks (or at least an attachment of data) is drastically lower than the rate of answers to forum questions that do have answers. So please, help us help you by taking the time to make a packaged workbook.
Fortunately when it comes to Likert scales and survey data we do have a sample data set that we can use, I'm using the data that Steve Wexler used in his visualizing survey data posts, in particular this one: Using Tableau to Visualize Survey Data — Part 3: Dealing with Multiple Memberships for Respondents » Data Revelations
[end of digression]
I took the raw data and pivoted the questions in v9 to make a tall data set, and renamed the pivot field names Question and the pivot field values Response. Then I filtered the data for 3 Generations (the segment) and a few of the answers. Here's a worksheet that duplicates the look of yours:
We could add a reference band for +/- 1 standard deviation but that has three problems: The first is that the SD would be computed for each Question (cell), not for each Generation within the cell. The second problem is that reference lines & bands are computed as an aggregation on the pill (e.g. AVG(Response)), not the underlying data (e.g. the Response for each ID). The third is that a single reference line/band has only one set of colors, whereas we might want to color each line/band based on the Mark color and not have to add a new line/band every time there's a segment.
One way to hack this would be to create a dual axis, on the second axis add the respondent ID to Level of Detail of the view, use table calculations or LOD expressions to compute the overall average to draw those Marks, create three additional calculations to return the Response per Generation, and then use those three calcs to make 3 reference bands with the right colors. But that's doing a lot of hard coding and making Tableau pull data for every ID from the data source which can impact performance.
A more elegant solution is to do our own calcs and then draw the bands on a secondary axis. Gantt marks are great for this, the continuous used on Rows/Columns for Gantt marks sets the start of the mark, then whatever is on the Size Shelf sets the length of the mark.
Here's the Gantt Start calc:
AVG([Response]) - (STDEV([Response]) * [# of SD])
It's taking the AVG(Response) and subtracting the STDEV(Response) * a # of SD factor that is a parameter (more on that below). We're using the AVG() aggregation at the same level of detail as the STDEV() so the results will be accurate, avoiding problem #2 with the reference lines/bands. In addition, this avoids problem #1 because the aggregations are computed at the level of detail of the view, which is Question & Generation.
Then the length (Size) of the Gantt bar is:
STDEV([Response]) * 2 * [# of SD]
Here's the start of the view, the only change was to make the Color for the Gantt marks slightly transparent.
And here's the view after synchronizing the axes:
The overlapping bars make the view harder to read, however clicking on a mark will highlight the entire bar:
If there are new generations added or removed there's no work to be done, the workbook will automatically adjust. Plus there's only standard aggregations used in the view, so the work Tableau and the data source have to do are much simpler.
Now for the bit on standard deviation and why I used a parameter. In the original post the goal was to show a 1 standard deviation range. Given a normal distribution, that only covers 68% of the data. A more common definition of outlier is to show 2SD (95%) or 3SD (99.7%), so I used the parameter to set that so you can see the difference. In practice I'd fix the number in the calcs and avoid the parameter, that way Tableau can do more caching and get more performance.
An alternative view would be to use a dashboard, I set up the original circle plot and then used a modified view with the Generations as separate rows and the Gantt marks. A Filter Action on Question connects the two:
Hope this helps!
[Edited 20150521 to fix the calcs per Steve Wexler's note below, change the screenshot to reflect that and post a revised packaged workbook]
First of all, THANK YOU! This is exactly what I needed, and I am so grateful to you for taking the time to post such a detailed and helpful response.
Second of all, THANK YOU for your absolutely appropriate digression about the lack of a sample workbook. You're completely right, and I was being lazy by not going to the effort to include one. I am properly chastised, and won't let it happen again. :-)
I'm gonna apply this to my workbook and let you know how it works. Again, appreciate your prompt and helpful response. You're a peach. :-)
I've created a sample workbook with my data, and tried your approach but I can't seem to get the GANTT bars right -- they're clearly not in the right place, and don't have any width at all. Would you mind taking a look and seeing if you can figure out what I'm doing wrong? I'd be grateful. Thank you.
Sample Survey Data.twbx 62.3 KB
Hi Susan, I'll get a chance to check tomorrow morning.
Thanks so much Jonathan.
On Wed, May 20, 2015 at 2:56 PM, Jonathan Drummey <
Ha! Never mind! I figured it out! (Didn't have Gantt Size on the Size pill, it was on the Label pill - DUH.) Thanks so much Jonathan!
I took a different approach where my single STDDEV matches your two levels of STDDEV.
I placed all the responses on the viz and then used the undocumented RANDOM() function to jitter the responses. I used a reference line to show average response and a reference band to show STDDEV.
I tend to like to see the marks to see how things cluster and if there are lots of outliers or not. One could also make the marks very subtle and replace the reference line with a big circle using a dual axis chart.
Any thoughts on why my -1, 1 STDDEV is equal to your -2, 2?
Jittering is a great way to deal with the limited number of discrete values, I like it! As I'm looking at yours, nother way could be to use a vertical bar chart on one axis and the ref line on the other.
Thanks for checking out what I'd posted, I'd built the function with one idea in mind (the parameter being the range of SD) and described it with another (the parameter being the +/- SD). I've fixed the workbook and edited my post to reflect that.
Jonathan Drummey I wanted to show you what I ended up with (attached). This allows me to eyeball the extent to which the average ratings on these items are (or aren't) half a standard deviation apart from other ratings. The bar is set to 1 standard deviation, giving me .5 SD to either side of the mean. Very helpful in my situation, where I need to see the difference relative not to one other mean, but 2 or 3 others. What do you think? I'm grateful for your help and Steve Wexler's in getting me to this!
Sample Survey Data.twbx 54.8 KB