6 Replies Latest reply on Feb 23, 2017 7:35 AM by Steve Morton

# Sankey Diagrams

Thanks to our meeting I went down the rabbit hole that is Sankey diagrams or graphs. I have to say it is a challenge to produce these graphs in any format let alone Tableau. I missed the name of the woman that showed us her version. But I can say I have both regretted it and appreciated it over the past day. There are some good discussions both in the Tableau user community and in some other excellent blogs.

Here is my first result. I'm still playing with details like labeling and colors but it is an actual result. And it is pretty cool.

The key learning over the last 2 days is that data prep is a primary requirement. I jumped in and built a Sankey diagram with an intermediate point. So each path has a starting location, an intermediate location, and a final location, positions 1, 2 and 3. Data needs to be generated to produce the curve of the path between these points, and to produce the ribbon effect.

First I needed to lay out the vertical positions I want. This was done in Excel using a running percentage total after sorting the data set on each category.

Producing the curve between the points is pretty straight forward. Tableau's primary limitation though is that the width of a line cannot be explicitly defined based on data content. If I want a line path that is 42% of the width of the plot area I cannot make it that explicit. The resulting view can be useful but is not the true Sankey view.

To get around the line width limitations, the standard response is to create a line for each percentage, essentially 100 different lines that follow their own path. My sample data set had 40 distinct paths through 2 transitions. The basic data set is pretty straight forward, a path defined by three categories, and the could of units passing through that path.

The first data prep step is to create the curve using the Sigmoid function. What this means in data prep terms is that each record has to be repeated with a newly created distinct value of T. My example uses T values from -6 to 6 incrementing every .25 for a row multiplier factor of 49. My first attempt was to cut/paste 49 times with a new T value. That got very boring fast so I decided to create a separate data set with the 49 T values and right join in tableau to create those records. On a new tab I created the 49 row data set for T values and joined to it.

Next I needed to spread the rows vertically to produce the width in the ribbons. My data at this point (after join) is one data row repeated 49 times to create a path. I now need to replicate the set of records defining a path to be as many paths as needed to fill the width of the ribbon. I defined the width in the data set using excel formulas (PctStart and PctEnd below). This is the width of each ribbon if the Y axis runs from 0 to 100.

Then because I hate cut/paste I used the same right join method to create the added rows. Tableau doesn't allow joins like "ON pct1 WHERE pct1 BETWEEN lowlimit and highlimit" and I couldn't figure out a join that actually worked. So I ended up adding all 100 percentages (100 rows) as new rows. My 40 row data set has now become 196,000 rows.

The join values are a constant of 1.

With the data prep complete I had to build out the calculated fields. First I built the Sigmoid function modified T value.

1/(1+EXP(1)^-[T])

Then I built the positions for each path. What I needed was the planned position from the original 40 paths. I can't use the psoitions explicitly, bu the offset from the start. I used the percentage value obtained by the linking to the percent dataset plus offsets for positions 2 and 3. In this way each produced path is identical in curve and shape, but simply offset from the original.

The curve calculation Is a bit trickier but is straight forward once you understand the direction changes.

IF [Pos 2]>[Pos 1] then

MIN([Pos 1],[Pos 2])+([Sigmoid Function]*ABS([Pos 2]-[Pos 1]))

ELSE MAX([Pos 1],[Pos 2])-([Sigmoid Function]*ABS([Pos 2]-[Pos 1]))

END

This is repeated for position 2 to 3.

Finally the piece I tripped on was to be sure to display the dimension T as well as the sum(T). Here is what one of the transitions looks like in design.

When I get a chance I will try to mask the data and package this for public consumption. For now I hope this is enough to get people started trying this on their own. It is a good viz, but tricky to set up and requires a lot of data prep. I have uses for it, but some real cost/benefit questions.

• ###### 1. Re: Sankey Diagrams

I forgot to mention the filter that brings the data back in line with the ribbon width. I created a Boolean calculated field to eliminate the rows where the display percentage was out of range.

[Display Pct]-[PctStart]>0 and [PctEnd]-[Display Pct]>=0

This brough the 196,000 rows down to 4851 actually used in the result. Also many of my ribbon ranges were too small to produce any result and were lost in the join logic.

SAS has some great data prep functionality for this. However my goal here was minimal data prep and maximum Tableau use. I still have only 40 lines in my excel, and a few calculated columns that can be produced with PARTITION BY statements pretty easily in a query. This method seems to be a better utility engine where different variables can be selected in adhoc queries, dropped into the same Excel and sent to the view.

• ###### 2. Re: Sankey Diagrams

Chris:

Good one. Just wondering have you checked out Jeffrey Shaffer blog on Sankey Data + Science

..kk

• ###### 3. Re: Sankey Diagrams

I did. It is the one I linked to in my opening. It is by far the best start. But I still muddied my way through the data prep. He also does all his data prep outside Tableau to simplify the Tableau portion. I opted to try to do as much in calculated fields within Tableau as possible. Performance may be slower but the use as a utility is greater.

• ###### 4. Re: Sankey Diagrams

So did I when building this Polygone sankey

The only data I use in this Sankey made of Dynamic Polygones is Step 1, Step 2 and Step 3 dimension + 1 indicator to add a bit of complexity + 1 dimension for filtering (Date). I blend it with Jeff's model et voila !

https://public.tableausoftware.com/shared/QD98S3PQK?:display_count=no

OC

• ###### 5. Re: Sankey Diagrams

Thats awesome and thank you. This chart will have added benefit in Marketing also.

..kk

• ###### 6. Re: Sankey Diagrams

Hi,

If this visual was added to a dashboard that contained other Tableau visuals and filters, would this Sankey be able to dynamically be re-drawn with only the filtered data?

Thank you