1 2 3 Previous Next 113 Replies Latest reply on Feb 13, 2019 7:29 PM by Selvi velraj

Sankey diagram made of dynamically generated polygons

As I was working on building a Sankey Diagram with Tableau, I found the excellent articles of Jeffrey Shaffer from Data + Science developing a methodology to build a Sankey in Tableau. His two articles (first & second) were my starting points.

At this stage, I wanted to add some interactivity on my diagram and minimize the data preparation... I am no Excel expert. I just wanted to be able to build my Viz using the model provided by Jeff and a dataset with Step1, Step 2 and Step 3 & 1 indicator (size) and dates to perform filtering... and I didn't want to use SQL or macros...

I also have an ID field but I do not want to use it...

I finally succeeded using Jeff's method but faced an issue regarding the performance of the Viz: showing several thousand calculated lines + colors in the same Viz takes some resources... So, I wanted to simplify my Viz. Considering that I could easily get all the lines describing each flow on the Diagram, I should probably be able to define the Upper Line and the Lower Line of each flow. And if I could do that, I could probably link these lines in order to build... a polygon. I would then have far less lines to calculate and show on my screen. So here is the challenge...

This Viz is a dashboard made of 3 bar graphs to populate nodes and 2 sankey to show the flows.

I will just explain how I built the first part of the Sankey and let you deduce the following one.

Step 0: Read articles from Jeff

http://www.dataplusscience.com/SankeyinTableau82.html

Step 1: Preparing the model in Excel

In the Excel sheet with the model data, we will double the whole rows and add 2 columns:

- A path column: order in which we will link the points

- A MIN/MAX column: will Tag the curve whether it is the min line or the max line.

It is important to note that the order of the path should be ascending when Min and descending when Max

Once it is done, just add the "Link" column in the dataset.

Step 2: connect to the data (with Tableau 8.2)

I personally work on a Mac... connect to the excel file containing the two sheets (data and model). Just let Tableau automatically build the inner join... and go Tableau!

Step 3: Build dynamic aggregated steps

We will first build the dynamic aggregated steps using parameters and sets. Basically, we want to show Top N Step 1 sorted by the SUM of Size:

- Create a "Top N Step 1" parameter: it should be an Integer with a min and a max

- Create a "Top Step 1" set: right-click on "Step 1" pill / create a set and sort it using Top / By Field / Top / Top N Step 1 (parameter) / Size (indicator) / Sum

- Create an "Agg Step 1" dimension: this should be

IF [Top Step 1]* THEN [Step 1] ELSE 'x Other' END

I use the x in order to easily sort the flows alphabetically.

*the set

Repeat the operation with Step 2 in order to build a new dimension called "Agg Step 2".

These operations will enable us to choose the number of "nodes" to display in the bar charts and also to generate the positions of the flows in the Sankey.

Step 4: Build the "Sigmoid" calculated field

Following Jeff's instructions: 1/(1+EXP(1)^-[T])

Step 5: Build the "Flow Size" calculated field

Here it gets interesting: we want to build flows that will start in a certain order and end divided into several lines and in another order. We will use a bit of simple table calculation...

The "Flow Size" definition will be: SUM([Size])/TOTAL(SUM([Size]))

We then obtain the relative size of the flow compared to the total sum of the indicator (in percentage). This will enable us to build a stable display of the Viz.

Click on "Default Table Calculation" on the top right of the Calculated Field window as shown below:

In "Compute using" choose "Advanced". Here you can choose how you want to address the partitioning. Choose "Agg Step 1" and "Agg Step 2" in this order. Choose also to sort the results by Fields using the Sum of Size as follows:

We will use this technique for almost all calculated fields.

Step 6: Build Position calculated fields

Here we want to define the starting points and the ending points of our Max Curves and our Min Curves. Let's start with Max Position 1.

Using Table Calculation, we define it using this formula: RUNNING_SUM([Flow Size]) computed along Agg Step 1, Agg Step 2. Again this will be defined in the advanced option of the Table Calculation.

It defines the Max starting point of each Curve

The Min Position 1 will easily be defined as the Max Position 1 minus the Size of the curve i.e.: RUNNING_SUM([Flow Size])-[Flow Size] computed along Agg Step 1, Agg Step 2.

Max Position 2 and Min Position 2 are defined using the same formulas. The only difference is that we will compute it along Agg Step 2 then Agg Step 1.

The advanced options of the Table Calculation enable to define the order in which the curves will start and in which they will end.

Step 7: Build the curves calculated fields

We will build a Max curve and a Min Curve based on Jeff's work.

Curve Max: [Max Position 1]+(([Max Position 2]-[Max Position 1])*ATTR([Sigmoid]))

Curve Min: [Min Position 1]+(([Min Position 2]-[Min Position 1])*ATTR([Sigmoid]))

We can now build our polygon curves "Curve Polygon":

CASE ATTR([Min or Max])  // remember the data we added to the model !

WHEN 'Min' THEN [Curve Min]

WHEN 'Max' THEN [Curve Max]

END

When building the Viz, the "path" will allow us to join the points to show the polygons.

Step 8: Build the Sankey

Choose the "polygon" graph type and place the following pills in the marks:

- Agg Step 1

- Agg Step 2

- Min or Max

- Path (as a path)

as shown below:

Then place the "T" pill in columns and show it as a dimension (right click / dimension) and finally place the "Curve Polygon" in rows.

A bit of cosmetics:

- change transparency to 50%

- clean the "T" axis to a fixed range of -6 to 6 and hide the axis

- clean the "Curve Polygon" axis to a fixed range of 0 to 1, reverse and hide the axis...

And voilà!

Step 9: Build the bar charts

Create another calculated field that we will call "Step Size" and defined by:

SUM([Size])/TOTAL(SUM([Size]))

On this one, we will not specify table calculations.

On a new sheet, place "Step Size" into rows and "Agg Step 1" in the marks (colors or details or whatever...)

I personally customize using borders:

Hide axis and repeat the operation for Step 2.

Now you can build the Sankey in a Dashboard!

And after a bit of cosmetics and customizing, here is the final result:

It is fully interactive and responsive!

I also built a long version that can be found on this link

Sorry that it is in French as well as the Tableau file and data... but this is my native language!

Conclusion:

1 - Little data preparation

2 - No SQL or macros required

3 - Very responsive since just showing polygons

Feel free to improve the method or email me for any question!

Olivier Catherin

• 1. Re: Sankey diagram made of dynamically generated polygons

Very nice Olivier! Love the step-by-step approach

• 2. Re: Sankey diagram made of dynamically generated polygons

Looking forward to trying this!

Kirstin

• 3. Re: Sankey diagram made of dynamically generated polygons

Fantastic Olivier, this looks incredible!
I was just looking at how to create a flow-model diagram to highlight the flow of
resources and this will be perfect for enabling me to do so!

Keep up the great work!

• 4. Re: Sankey diagram made of dynamically generated polygons

Hi Chris,

I just published another post regarding decision trees. The method could help to go further in such kind of viz !

Decision trees, flow diagrams, sankeys in Tableau... here is a solution !!!

Enjoy !

Cheers

Olivier

5 of 5 people found this helpful
• 5. Re: Sankey diagram made of dynamically generated polygons

Olivier CATHERIN your diagrams are beautiful! Thanks for sharing with us how to create them.

I must admit though, that I have not been able to re-create them as I had hoped to.

I tried to visualize a group senders sending payments to a group of receivers; pretty much an ideal use case for Sankey diagrams, I thought. Only even a bit simpler than your example.
Unfortunately, instead of the beautiful sigmoid curves you arrived at, I got this:

I suspect it is a matter of how both the senders (step 1) and the receivers (step 2) are grouped and ordered.
I can't figure out where I am going wrong, though. (I think I did follow the instructions correctly.)

For any help on where I went wrong, I'd be immensely grateful...

(PS: Eventually I hope to be able to turn an itemized list of payments into a Sankey diagram (first tab of attached excel doc), not only a list of totals for each sender-receiver pair (second tab of excel doc) like I used this time. But first things first.)

@

3 of 3 people found this helpful
• 6. Re: Sankey diagram made of dynamically generated polygons

Hi Thorsten,

The issue in your Sankey is because of the sorting on the Sum of Payment. Uncheck the sorting in the table calc parameters and this will work properly. (The TOP parameter already defines the filter for the view.)

See attached corrected workbook.

Cheers

2 of 2 people found this helpful
• 7. Re: Sankey diagram made of dynamically generated polygons

Thank you SO much, Olivier CATHERIN!

• 8. Re: Sankey diagram made of dynamically generated polygons

Hi all,

I brought a bit more of complexity to achieve this new viz based on superstore. Tell me what you think !

5 of 5 people found this helpful
• 9. Re: Sankey diagram made of dynamically generated polygons

Thank you very much for the post! How do I get to highlight the entire path on hovering the mouse?

• 10. Re: Sankey diagram made of dynamically generated polygons

Creating a highlight hover action produced this behavior. See my attached example.

5 of 5 people found this helpful
• 11. Re: Sankey diagram made of dynamically generated polygons

Hi Oliver, really good stuff!  I've been working on applying this viz technique on quite a few datasources and to learn I've tried following both your steps and Jeff's.

I am also having an issue with the sort it seems.  After downloading Thorsten's original twbx above and removing the sort, I'm still unable to reproduce the Sankey viz as seen in your corrected repost.  Did you take any additional steps to fix this?  Any help or insight is greatly appreciated.

Thanks!

• 12. Re: Sankey diagram made of dynamically generated polygons

Thanks, very helpful. Following it up, highlighting by "ID" (in the above example) is possible? That is, can the Step1-Step2-Step3 path for any ID be highlighted?

• 13. Re: Sankey diagram made of dynamically generated polygons

Hi John,

I just have seen your post now. Do you still have issues with the viz ?

Cheers

O.

• 14. Re: Sankey diagram made of dynamically generated polygons

Hi John,

I ran into the same problem when I worked on Thorsten's workbook. I reset all of the sort functions in the measures but the viz didn't look like the one in the corrected workbook. I finally realized that I had to reset the sort functions in the Curve Polygon pill. Click the drop down menu in the pill, select Edit Table Calculation, and update the sort order for the Calculated Fields until the viz displays properly.

Scott

2 of 2 people found this helpful
1 2 3 Previous Next