As I was working on building a Sankey Diagram with Tableau, I found the excellent articles of Jeffrey Shaffer from Data + Science developing a methodology to build a Sankey in Tableau. His two articles (first & second) were my starting points.
At this stage, I wanted to add some interactivity on my diagram and minimize the data preparation... I am no Excel expert. I just wanted to be able to build my Viz using the model provided by Jeff and a dataset with Step1, Step 2 and Step 3 & 1 indicator (size) and dates to perform filtering... and I didn't want to use SQL or macros...
I also have an ID field but I do not want to use it...
I finally succeeded using Jeff's method but faced an issue regarding the performance of the Viz: showing several thousand calculated lines + colors in the same Viz takes some resources... So, I wanted to simplify my Viz. Considering that I could easily get all the lines describing each flow on the Diagram, I should probably be able to define the Upper Line and the Lower Line of each flow. And if I could do that, I could probably link these lines in order to build... a polygon. I would then have far less lines to calculate and show on my screen. So here is the challenge...
This Viz is a dashboard made of 3 bar graphs to populate nodes and 2 sankey to show the flows.
I will just explain how I built the first part of the Sankey and let you deduce the following one.
Step 0: Read articles from Jeff
Step 1: Preparing the model in Excel
Please first download the Excel model file from Data + Science
In the Excel sheet with the model data, we will double the whole rows and add 2 columns:
- A path column: order in which we will link the points
- A MIN/MAX column: will Tag the curve whether it is the min line or the max line.
It is important to note that the order of the path should be ascending when Min and descending when Max
Once it is done, just add the "Link" column in the dataset.
Step 2: connect to the data (with Tableau 8.2)
I personally work on a Mac... connect to the excel file containing the two sheets (data and model). Just let Tableau automatically build the inner join... and go Tableau!
Step 3: Build dynamic aggregated steps
We will first build the dynamic aggregated steps using parameters and sets. Basically, we want to show Top N Step 1 sorted by the SUM of Size:
- Create a "Top N Step 1" parameter: it should be an Integer with a min and a max
- Create a "Top Step 1" set: right-click on "Step 1" pill / create a set and sort it using Top / By Field / Top / Top N Step 1 (parameter) / Size (indicator) / Sum
- Create an "Agg Step 1" dimension: this should be
IF [Top Step 1]* THEN [Step 1] ELSE 'x Other' END
I use the x in order to easily sort the flows alphabetically.
Repeat the operation with Step 2 in order to build a new dimension called "Agg Step 2".
These operations will enable us to choose the number of "nodes" to display in the bar charts and also to generate the positions of the flows in the Sankey.
Step 4: Build the "Sigmoid" calculated field
Following Jeff's instructions: 1/(1+EXP(1)^-[T])
Step 5: Build the "Flow Size" calculated field
Here it gets interesting: we want to build flows that will start in a certain order and end divided into several lines and in another order. We will use a bit of simple table calculation...
The "Flow Size" definition will be: SUM([Size])/TOTAL(SUM([Size]))
We then obtain the relative size of the flow compared to the total sum of the indicator (in percentage). This will enable us to build a stable display of the Viz.
Click on "Default Table Calculation" on the top right of the Calculated Field window as shown below:
In "Compute using" choose "Advanced". Here you can choose how you want to address the partitioning. Choose "Agg Step 1" and "Agg Step 2" in this order. Choose also to sort the results by Fields using the Sum of Size as follows:
We will use this technique for almost all calculated fields.
Step 6: Build Position calculated fields
Here we want to define the starting points and the ending points of our Max Curves and our Min Curves. Let's start with Max Position 1.
Using Table Calculation, we define it using this formula: RUNNING_SUM([Flow Size]) computed along Agg Step 1, Agg Step 2. Again this will be defined in the advanced option of the Table Calculation.
It defines the Max starting point of each Curve
The Min Position 1 will easily be defined as the Max Position 1 minus the Size of the curve i.e.: RUNNING_SUM([Flow Size])-[Flow Size] computed along Agg Step 1, Agg Step 2.
Max Position 2 and Min Position 2 are defined using the same formulas. The only difference is that we will compute it along Agg Step 2 then Agg Step 1.
The advanced options of the Table Calculation enable to define the order in which the curves will start and in which they will end.
Step 7: Build the curves calculated fields
We will build a Max curve and a Min Curve based on Jeff's work.
Curve Max: [Max Position 1]+(([Max Position 2]-[Max Position 1])*ATTR([Sigmoid]))
Curve Min: [Min Position 1]+(([Min Position 2]-[Min Position 1])*ATTR([Sigmoid]))
We can now build our polygon curves "Curve Polygon":
CASE ATTR([Min or Max]) // remember the data we added to the model !
WHEN 'Min' THEN [Curve Min]
WHEN 'Max' THEN [Curve Max]
When building the Viz, the "path" will allow us to join the points to show the polygons.
Step 8: Build the Sankey
Choose the "polygon" graph type and place the following pills in the marks:
- Agg Step 1
- Agg Step 2
- Min or Max
- Path (as a path)
as shown below:
Then place the "T" pill in columns and show it as a dimension (right click / dimension) and finally place the "Curve Polygon" in rows.
A bit of cosmetics:
- change transparency to 50%
- clean the "T" axis to a fixed range of -6 to 6 and hide the axis
- clean the "Curve Polygon" axis to a fixed range of 0 to 1, reverse and hide the axis...
Step 9: Build the bar charts
Create another calculated field that we will call "Step Size" and defined by:
On this one, we will not specify table calculations.
On a new sheet, place "Step Size" into rows and "Agg Step 1" in the marks (colors or details or whatever...)
I personally customize using borders:
Hide axis and repeat the operation for Step 2.
Now you can build the Sankey in a Dashboard!
And after a bit of cosmetics and customizing, here is the final result:
It is fully interactive and responsive!
I also built a long version that can be found on this link
Sorry that it is in French as well as the Tableau file and data... but this is my native language!
This method has 3 advantages:
1 - Little data preparation
2 - No SQL or macros required
3 - Very responsive since just showing polygons
Feel free to improve the method or email me for any question!