11 Replies Latest reply on Dec 19, 2014 1:07 PM by Keith Helfrich

# Why the Long Sankey ?

Dear Tableau Community,

Similar to the way we found it possible (with some creative savvy) to find the INTERSECTION of multi-select quick filter values, I'm back again with a new brain teaser that revolves around the competitive sport of Bocce Ball.  I'm hoping you can help!

Admittedly, the title of this post should be How the Long Sankey ?  But that name wasn't nearly as catchy as asking "why"

Please, see the attatched .twbx where I've story-boarded the challenge.  In short, I'm searching for help with how to build Jeffrey Shaffer's Sankey diagrams from a LONG format data source.

Any & all ideas are welcome.

Extra points if you can fill in the punchline to the joke that starts: A horse walks into a bar & the bartender asks, "why the long Sankey?"

Thanks!

• ###### 1. Re: Why the Long Sankey ?

On "The Challenge" sheet, if you simply remove "Series" and "Rank" from the Row, don't you get what you are looking for?

• ###### 2. Re: Why the Long Sankey ?

Hi Jeffrey,

Yes, indeed I do, sometimes accident is the mother of great design!

I'd only just put that "The Challenge" sheet together when building the explanation using fake data.  So it wasn't an angle I had explored, really.  It's nice how the measure values / measure names accomplish what I'm searching for.

Let me try to build the Sankey graphs from there & see how it goes.

Thanks!

• ###### 3. Re: Why the Long Sankey ?

So I've accomplished what I was looking for.

In the updated .twbx, attached, I've done my best to walk thru the steps in a "Solution" story-board.  You'll find on the last page the ability to choose at runtime which two "series" should be use for the comparison via parameters.

An Observation:

---

This is a beautiful example of Joe's emphasis on the need for understand 'The Four Pill Types'. I had been treating [Rank] as a dimension because, in the traditional definition, it is one! But Tableau doesn't care about the traditional definition of Dimension & Measure. In tableau: any field can be either a measure or a dimension (or both!). And you need to use the definition that suits your fancy!

---

A Question:

---

My "Solution" story might be a bit difficult to follow, but at the bottom of the Empty the ROWS & COLUMNS story point, I've made the curious observation:

- after all this effort to "widen" the data, Tableau is still storing it in a TALL format

- so, shouldn't there be an easier way to get there, without going thru measure names & back ?

---

Something to Keep you Up at Night:

---

... so a horse walks into a bar & the bartender asks ...

---

THANKS!

• ###### 4. Re: Why the Long Sankey ?

Here is an exemple of a polygone Sankey Digram based on Jeff's work : https://public.tableausoftware.com/shared/NQM7HZX4S?:display_count=no

Using polygones makes it much more efficient and responsive.

• ###### 5. Re: Why the Long Sankey ?

Thank you, Oliver.  This is quite attractive and also very responsive.

Are you perhaps able to point to a blog post where the polygons approach is described in detail ?

Thanks!

• ###### 6. Re: Why the Long Sankey ?

Hi Keith,

I will try to write a full post regarding the method. While waiting, here is a briefing of the approach.

My starting point is the work of Jeff that you can find on the following link : Data + Science

It describes the way to build a spaghetti graph looking like a sankey. It is based on a dataset containing all the required data in order to rank the lines of the graph.

My first idea was to build the same dynamically not using any additional hard coded data such as ID, ranking or anything else. Just Step 1 and Step 2 data. The challenge is to define the position of each line based on a dynamic ranking (RANK_UNIQUE) and Table calculation.

From this point, if one can draw all the lines of the flow, I thought it may be possible to define only the TOP and the BOTTOM line of each flow. And if one can join the points altogether, then this would allow to draw a polygone.

So I took the model from Jeff and added the following data in excel :

• - Way : MIN
• - Path : from 0 (T=-6) to 48 (T=6)

And doubled the lines with

• - Way : MAX
• - Path : from 97 (T=-6) to 49 (T=6)

In Tableau, I can now generate the MIN and the MAX lines for Position 1 and 2

• Max Position 1 : RUNNING_SUM(Size) Calculated with Step 1, Step 2 (Advanced table calculation option in the formula window)
• Min Position 1 : RUNNING_SUM(Size) - (Size) Calculated with Step 2, Step 1 (Advanced table calculation option in the formula window)

Size being whatever I use to define the size of the flow

You can repeat the operation for Position 2 (Min and Max)

Using Jeff's method, you can draw a Min Curve (with Min Position 1 and Min Position 2) and a Max Curve (with Max Position 1 and Max Position 2).

The polygone will be drawn the following way :

CASE ATTR(way)

WHEN 'Min' THEN Curve Min

WHEN 'Max' THEN Curve Max

END

When building the graph, you just have to turn the graph type into polygones and place the path pill on 'Path'.

Hope this helps !

• ###### 7. Re: Why the Long Sankey ?

That does help, Oliver.  Thanks!

I had seen each of those things when I took apart the workbook, but it's a lot easier when somebody explains it to you. Otherwise, it's more work to deduce how the pieces fit together.  Avoiding the Google search for words in French that mean 'path' in English also helps!

This is really very nice!  I love how beautiful the lines are, how fast it renders, and how it's no longer necessary to build in the ID values.

Please let us know when a full post is written, I'm certain you'll make the Best of Web from Tableau =)

Thanks!

• ###### 8. Re: Why the Long Sankey ?

Thx a lot. I'm working on it... Will post the link tomorrow !

Cheers

• ###### 9. Re: Why the Long Sankey ?

Here it is : a full description of the approach...

Feel free to give your feedbacks and improvement ideas !

Cheers

OC

• ###### 10. Re: Why the Long Sankey ?

Excellent work Olivier!

--Shawn

• ###### 11. Re: Why the Long Sankey ?

Hi Oliver,

I'm just now getting around to taking a look at what you've done.  It is quite nice, but I'm having trouble making the bridge from your work to my own.

Your example blog posts describe how to build a pair of TOP N parameters, and the data format that you're working with is also bit different from mine.

• Your data comes in the wide format | Date | ID | Step 1 | Step 2 | Step 3 | Size |
• My data comes in the tall format | League | Series | Team ID | Team Name | Rank | Score |

So, in other words, I have a single Rank column which gives the rank for each team in every Series of every League.

As seen in the comments above, I could get Jeff's original Sankey model to work correctly on this data by using Measure Names and Measure Values.  But it would be really nice to use your polygon approach for this data, as well.

Is there a way that you could help me to understand the best way to get it done ?

I'm attaching the raw data here, and also the original workbook which explains what I want to do.

Thanks!