Thank you for posting this very helpful discussion.
I have tried to re-create the Sankey Chart following your second viz, but instead of doing 4 levels/nodes, I took 3.
I am not able to get the curves to be displayed.
Could you please take a look and guide me as to where I went wrong?
Thank you in advance.
Sankey Trail.twbx 1.1 MB
Olivier CATHERIN Thank you for sharing your fantastic Sankey diagram and showing us how to do it step by step. I followed your tutorial step by step, but I was not able to the beautiful Sankey plot as yours. I doubled checked the advanced table calculation setting and the sorting setting, but I still was not able to fix it. Could you please take a look? Thank you.
Sankey.twbx 170.7 KB
I am really close to recreating this viz, however when I add the second sankey, it doesn't align with the step 2 chart. below is a screenshot with the misalignment circled in red.
You can see that the colors should line up, but they don't
Any ideas on where I messed up?
--- UPDATE ---
Not exactly sure if this was my only issue, but I studied some existing 3 step and noticed there are 2 step two max's and min's with different agg steps selected for the calculation. I tried fixing that and didn't get it, but then just started the process over entirely and now it is all good. So I am assuming I was needing that second mx/min calc but then I also had something messed up somewhere else. Here is my updated sankey:
1 of 1 people found this helpful
I've figured out how to get dynamic steps working without creating a sheet for each step.
I wanted to show what bee hives were being used to create more hives and which hives were growing faster than others over time and it seemed like a sankey diagram was a good solution.
The post by Olivier about how to make a polygon sankey diagram got me most of the way there, but I could only show a single step(date) at a time, or at best I could group a set of steps(dates) together and show them as a single step. The workaround to create a different sheet for each step wouldn't work for me, because I'd have to keep adding another for each new data point, and during the spring there's a new data point every week. I needed a way to have a multistep sankey diagram.
It took me almost a week of playing around to figure out what was needed to make that happen.
.The first thing I had to do was get the steps, dates in my case, in an order that was sequential and that could be used to spread out the x-axis(columns). I needed a set of T's for each step that were adjacent to each other.
I did this by creating a new table for each step(date) that had an order/rank.
Now I needed to use that number to spread out something over the x axis.
I created a new calculated field with the calculation 12 * [Order] + [T] - 6
I then replaced the existing pill T in Columns with "T with step order".
I also replaced the Detail pill that was Min or Max with Step Name. You can add the step name(date moved) to colors if you want.
I then edited the table calculations on Polygon Curves to include step name(date moved) for all 5 nested calculations.
Make sure the min and max position 2's have 'end name" first and "start name" second. The min and max positions 1's should have "start name" first and "end name" second.
OMG this is lovely but OMG why can I not get it to work!
Can anyone point me in the right direction here? Seems like my Marks aren't quite right
Heres an image to go with the attached
SankeyPolygonTesting123.twbx 18.5 KB
Olivier CATHERIN, thank you for the fantastic viz. When attempting to create a second Sankey, I am running into the following problem, and I would greatly appreciate any insight you or the community can provide. My second polygon curve "requires a field that is missing":
I can get it to display the proper polygons when I add a sum sort to the table calc:
This however, inexplicably brings Agg Step A onto the rows shelf (apparently this was the required missing field?):
I have checked, rechecked, and rechecked my calculated fields, table calcs, etc. I have sat down with your twbx open next to mine and compared every single element.
What am I missing?
I guess that you have defined default table calculation in the curve formula is using Agg Step A in the computation. You'd rather reset to default and configure the table calculation directly from the pill in the view. it will save you time with a single calculation that could be used in each view.
Hope this helps !
I've built the Sankey diagram using the polygon methodology, but I'm having significant performance issues. My dataset is ~400,000 rows before densification, and the dashboard is too slow to be used after joining with the additional dataset.
I've tried to aggregate the data before bringing it into Tableau, but it this methodology doesn't seem to work with pre-aggregated data. I've also tried another workaround (https://www.dataplusscience.com/SankeyinTableau82.html) but using this type of dataset seems to limit how dynamic the dashboard can be.
Are there any workarounds or alternate methodologies to improve the performance of a sankey diagram with a dataset of this size?
First, I would suggest to use extracts. As per my own experience, working with extracts and heavy calculation (distinct counts...) have good enough performances with final datasets of more than 300 Million rows.
Also make sure that the db is powerful enough and is properly indexed.
Otherwise, the methodology should work properly with prepared data. Consider that the data as set of dimension association (flows) that you would like to represent in a sankey. Whatever the size of the initial dataset, the total number of possible association should not exceed a few thousands considering several dimensions. You could use a custom SQL Query to prepare the initial data using a "group by" for all the required dimension and the proper aggregation for the value you need and the new field for the densification.
For exemple with Superstore (10 000 rows), let say you want to see <segments>, <category>, <sub-category> and <ship-mode> filtered by <years>, pre-aggregated data will return 738 rows instead and work fine with the sankey.
Finaly, make sure you use sankey best practices :
- not too many steps : limit to 3 to 5 steps.
- not too many flows : If a required single dimension has more than a tenth of aliases, consider using another visualization than the sankey that will be messy.
I hope this helps.
Thank you for your posts regarding this topic, they have been very helpful.
I have managed to replicate the originals but struggled with my own data with it splitting incorrectly at Level 2, Position 2. There should only be max. 3 nodes (at Position 2) not 4.
I have attempted to correct by sorting Position 2 by the Level 2 field which works but then two streams become tangled.
My Index seems to be correct
Any suggestions would be appreciated?
Sorry, I am unable to upload the workbook as i am using email addresses to join the three data sources.
Many thanks in advance.
Thanks for sharing with sankey chart build approach.
I have one more question, can you please look into below :
I want to track back flow of customers enrolling into different departments and different promotions . For example, Customer 'A' is has tried to eligible for a promotion 1 through department 1 but he isn't eligible so he comes back to department 2 and enroll in promotion 2.
Department and Promotion would be my dimensions.
My flow should be as below:
Customer A -->Department 1--->Promtion 1 --> Department 2---> Promotion2
Please, let me know if this feasible.
(I fixed my original problem above - Hooray!)
any hoo ... has anyone tried to do the polygon sankey using a tableau datasource with an excel model blended in on 'Link'
I have cross checked everything but have a few funnies
The chart is just a white chart as:
And then the CurvePolygon seems to calculate differently:
Hi Olivier, its a wonderful chart that you have created and a well documented steps. I tried to replicate in my dataset attached in the mail but not able to . can you please help in creating or tell the steps to create for the below data.
Dummy.xlsx 13.0 KB