On "The Challenge" sheet, if you simply remove "Series" and "Rank" from the Row, don't you get what you are looking for?
Yes, indeed I do, sometimes accident is the mother of great design!
I'd only just put that "The Challenge" sheet together when building the explanation using fake data. So it wasn't an angle I had explored, really. It's nice how the measure values / measure names accomplish what I'm searching for.
Let me try to build the Sankey graphs from there & see how it goes.
So I've accomplished what I was looking for.
In the updated .twbx, attached, I've done my best to walk thru the steps in a "Solution" story-board. You'll find on the last page the ability to choose at runtime which two "series" should be use for the comparison via parameters.
This is a beautiful example of Joe's emphasis on the need for understand 'The Four Pill Types'. I had been treating [Rank] as a dimension because, in the traditional definition, it is one! But Tableau doesn't care about the traditional definition of Dimension & Measure. In tableau: any field can be either a measure or a dimension (or both!). And you need to use the definition that suits your fancy!
My "Solution" story might be a bit difficult to follow, but at the bottom of the Empty the ROWS & COLUMNS story point, I've made the curious observation:
- after all this effort to "widen" the data, Tableau is still storing it in a TALL format
- so, shouldn't there be an easier way to get there, without going thru measure names & back ?
Something to Keep you Up at Night:
... so a horse walks into a bar & the bartender asks ...
TALL_Sankey_Reshaping.twbx 364.5 KB
I will try to write a full post regarding the method. While waiting, here is a briefing of the approach.
My starting point is the work of Jeff that you can find on the following link : Data + Science
It describes the way to build a spaghetti graph looking like a sankey. It is based on a dataset containing all the required data in order to rank the lines of the graph.
My first idea was to build the same dynamically not using any additional hard coded data such as ID, ranking or anything else. Just Step 1 and Step 2 data. The challenge is to define the position of each line based on a dynamic ranking (RANK_UNIQUE) and Table calculation.
From this point, if one can draw all the lines of the flow, I thought it may be possible to define only the TOP and the BOTTOM line of each flow. And if one can join the points altogether, then this would allow to draw a polygone.
So I took the model from Jeff and added the following data in excel :
- - Way : MIN
- - Path : from 0 (T=-6) to 48 (T=6)
And doubled the lines with
- - Way : MAX
- - Path : from 97 (T=-6) to 49 (T=6)
In Tableau, I can now generate the MIN and the MAX lines for Position 1 and 2
- Max Position 1 : RUNNING_SUM(Size) Calculated with Step 1, Step 2 (Advanced table calculation option in the formula window)
- Min Position 1 : RUNNING_SUM(Size) - (Size) Calculated with Step 2, Step 1 (Advanced table calculation option in the formula window)
Size being whatever I use to define the size of the flow
You can repeat the operation for Position 2 (Min and Max)
Using Jeff's method, you can draw a Min Curve (with Min Position 1 and Min Position 2) and a Max Curve (with Max Position 1 and Max Position 2).
The polygone will be drawn the following way :
WHEN 'Min' THEN Curve Min
WHEN 'Max' THEN Curve Max
When building the graph, you just have to turn the graph type into polygones and place the path pill on 'Path'.
Hope this helps !
That does help, Oliver. Thanks!
I had seen each of those things when I took apart the workbook, but it's a lot easier when somebody explains it to you. Otherwise, it's more work to deduce how the pieces fit together. Avoiding the Google search for words in French that mean 'path' in English also helps!
This is really very nice! I love how beautiful the lines are, how fast it renders, and how it's no longer necessary to build in the ID values.
Please let us know when a full post is written, I'm certain you'll make the Best of Web from Tableau =)
Thx a lot. I'm working on it... Will post the link tomorrow !
Here it is : a full description of the approach...
Feel free to give your feedbacks and improvement ideas !
Excellent work Olivier!
I'm just now getting around to taking a look at what you've done. It is quite nice, but I'm having trouble making the bridge from your work to my own.
Your example blog posts describe how to build a pair of TOP N parameters, and the data format that you're working with is also bit different from mine.
- Your data comes in the wide format | Date | ID | Step 1 | Step 2 | Step 3 | Size |
- My data comes in the tall format | League | Series | Team ID | Team Name | Rank | Score |
So, in other words, I have a single Rank column which gives the rank for each team in every Series of every League.
As seen in the comments above, I could get Jeff's original Sankey model to work correctly on this data by using Measure Names and Measure Values. But it would be really nice to use your polygon approach for this data, as well.
Is there a way that you could help me to understand the best way to get it done ?
I'm attaching the raw data here, and also the original workbook which explains what I want to do.
Red Headed Step Data