Data format for Multi-Layered Sankey Diagrams

Hi all! I was hoping someone could help me with an explanation on how my data needs to be formatted for a multi-level Sankey with polygons. I've explored numerous youtube vids and articles over the last couple days but most seem to cover viz creation in detail as to the specifics on your data structure. After review I think i'm comfortable with at least the basics of the viz creation process for the diagram, but i'm still misunderstanding how my data needs to look in excel.

This is the tutorial I've most recently been trying to follow: Sankey diagram made of dynamically generated polygons

As I understand, the Multi Level Sankey Template from Ken Flerlage is intended to be a plug and play setup, i'm just missing some understanding on what exactly i need data-wise.

The first 3 observations in his template:

 Link ID Step 1 Step 2 Step 3 Step 4 Step 5 Size link Thing 001 D H N Q W 72 link Thing 002 A F I O X 140 link Thing 003 A E I O X 232

Questions -

ID: In his template each ID is unique. Am I then to use data duplication to always double each observation (a) or do i duplicate each row a certain number of times depending on something such as the number of steps (b) where you'd see Thing 001 5 times since there are 5 steps?

a)

 Link ID Step 1 Step 2 Step 3 Step 4 Step 5 Size link Thing 001 D H N Q W 72 link Thing 001 D H N Q W 72 link Thing 002 A F I O X 140 link Thing 002 A F I O X 140

b)

Steps: I think this one is straight forward - this is just the progression of each customer through what will be our nodes, right?

Size: This one i'm pretty lost on. My first thought is this was the number of IDs that followed a given route through all nodes - this idea definitely doesn't work with the data in the template though so i'm left scratching my head. Any instruction on where you get size value would be really appreciated!

As for his "model" tab in the template, the only question I have is on the Path header. The tutorial from Oliviar I linked mentions,"It is important to note that the order of the path should be ascending when Min and descending when Max". Ken's template doesn't follow this as you can see once we shift from 'Min' to 'Max' the path order continues increasing from 49 instead of decreasing from 97 to 49 -

a)

b)

Which is correct here?

I'd truly appreciate any clarification on this and I'm happy to put together any example data that could help.

• 1. Re: Data format for Multi-Layered Sankey Diagrams

Hi Chris. Happy to help with this. If I understand your questions...then you should be using approach (a) -- One thing flows through all five phases. So, perhaps this is a sales order that must be 1) Ordered, 2) Filled, 3) Shipped, 4) Invoiced, and 5) Delivered. That one order goes through all five phases and will have just one record.

Steps are just the individual phases. Size is the measure being visualized. So, using the above example, we might just visualize the number of orders. So that row's Size would just be 1 (1 order), but maybe you want to visualize the quantity ordered via a sankey. In that case, Size might be 10 or 20 (or whatever quantity was ordered). I realize now that the term, Size, is terrible. I should think about changing that.

The Model stuff is all intended to do the data densification needed to draw the curves. I'd have to go back to determine why I varied from Olivier's approach, but understanding the details of how this works isn't really all that critical, I don't think.

Hope that helps. If you'd like to share some data, I'd be happy to help you work backwards to find the best way to fit the data into the template.

- Ken

• 2. Re: Data format for Multi-Layered Sankey Diagrams

Ken Flerlage, thank you so much for the reply! Your comments on Size make sense to a degree, but i'd like a little more to make sure i'm grasping it right. I've laid out a little scenario below along with some sample data that I was hoping you wouldn't mind working through with me.

With this example lets imagine you and I own a business that sells guitar picks.  We've contracted several famous musicians to work as influencers for us and we would like to visualize our how our costs changed for each influencer across time.

Here are the first  two lines of the attached example.

As you mentioned above, Steps are our phases, so here lets imagine them as Quarters 1 - 4. So we payed John Doe \$100 Q1, \$150 Q2, and so on...

In regard to Size, maybe if you could give an example of what kind of values we'd use in the scenario I laid out i'd be able to better understand.

Again, super grateful for your help here!

• 3. Re: Data format for Multi-Layered Sankey Diagrams

This is not really how this chart works. What you're showing would make much more sense as a simple line or area chart. Sankeys show flow through through various "steps" or "phases". In this case, the value in the "Steps" is the status within that phase. A great example is higher education. Each record might be a student and each "Step" is a year of college, with the value of each step being that student's major at that time. In this case, your data would look like this with Size=1 since each record represents a single student.

 Link ID Step 1 Step 2 Step 3 Step 4 Size link John Doe Engineering Computer Science Computer Science Computer Science 1 link Jane Doe Art History Art History Art History Art History 1 link Billy Joel Engineering Computer Science Computer Science Computer Science 1 link Freddy Mercury Music Music Music Performance Music Performance 1 link Michael Jackson Music Music Music Performance Music Performance 1

But, if you didn't care about being able to trace a specific student, you could aggregate this data.

 Link ID Step 1 Step 2 Step 3 Step 4 Size link Eng/CS/CS/CS Engineering Computer Science Computer Science Computer Science 2 link Art History Art History Art History Art History Art History 1 link Music/Music/MP/MP Music Music Music Performance Music Performance 2

In this case, Size represents the total number of students who took that path.

• 4. Re: Data format for Multi-Layered Sankey Diagrams

This explanation was perfect, I think this clarifies everything for me.

Thanks so much, time to get to actually building the viz now

• 5. Re: Data format for Multi-Layered Sankey Diagrams

Great!! Please be sure to share the result (if you can).

Great!! Please be sure to share the result (if you can).