5 Replies Latest reply on Aug 26, 2019 6:17 AM by Ken Flerlage

    Data format for Multi-Layered Sankey Diagrams

    chris reed

      Hi all! I was hoping someone could help me with an explanation on how my data needs to be formatted for a multi-level Sankey with polygons. I've explored numerous youtube vids and articles over the last couple days but most seem to cover viz creation in detail as to the specifics on your data structure. After review I think i'm comfortable with at least the basics of the viz creation process for the diagram, but i'm still misunderstanding how my data needs to look in excel.

       

      This is the tutorial I've most recently been trying to follow: Sankey diagram made of dynamically generated polygons

       

      As I understand, the Multi Level Sankey Template from Ken Flerlage is intended to be a plug and play setup, i'm just missing some understanding on what exactly i need data-wise.

       

      The first 3 observations in his template:

          

      LinkIDStep 1Step 2Step 3Step 4Step 5Size
      linkThing 001DHNQW72
      linkThing 002AFIOX140
      linkThing 003AEIOX232

       

      Questions -

       

      ID: In his template each ID is unique. Am I then to use data duplication to always double each observation (a) or do i duplicate each row a certain number of times depending on something such as the number of steps (b) where you'd see Thing 001 5 times since there are 5 steps? 

           a)

      LinkIDStep 1Step 2Step 3Step 4Step 5Size
      linkThing 001DHNQW72
      linkThing 001DHNQW72
      linkThing 002AFIOX140
      linkThing 002AFIOX140

       

           b)

          

      LinkIDStep 1Step 2Step 3Step 4Step 5Size
      linkThing 001DHNQW72
      linkThing 001DHNQW72
      linkThing 001DHNQW72
      linkThing 001DHNQW72
      linkThing 001DHNQW72
      linkThing 002AFIOX140
      linkThing 002AFIOX140
      linkThing 002AFIOX140
      linkThing 002AFIOX140
      linkThing 002AFIOX140

       

      Steps: I think this one is straight forward - this is just the progression of each customer through what will be our nodes, right?

       

      Size: This one i'm pretty lost on. My first thought is this was the number of IDs that followed a given route through all nodes - this idea definitely doesn't work with the data in the template though so i'm left scratching my head. Any instruction on where you get size value would be really appreciated!

       

      As for his "model" tab in the template, the only question I have is on the Path header. The tutorial from Oliviar I linked mentions,"It is important to note that the order of the path should be ascending when Min and descending when Max". Ken's template doesn't follow this as you can see once we shift from 'Min' to 'Max' the path order continues increasing from 49 instead of decreasing from 97 to 49 -

      a)

      link5.7547Min
      link648Min
      link649Max
      link5.7550Max

      b)

        

      link5.7547Min
      link648Min
      link-697Max
      link-5.7596Max

       

      Which is correct here?

       

      I'd truly appreciate any clarification on this and I'm happy to put together any example data that could help.

        • 1. Re: Data format for Multi-Layered Sankey Diagrams
          Ken Flerlage

          Hi Chris. Happy to help with this. If I understand your questions...then you should be using approach (a) -- One thing flows through all five phases. So, perhaps this is a sales order that must be 1) Ordered, 2) Filled, 3) Shipped, 4) Invoiced, and 5) Delivered. That one order goes through all five phases and will have just one record.

           

          Steps are just the individual phases. Size is the measure being visualized. So, using the above example, we might just visualize the number of orders. So that row's Size would just be 1 (1 order), but maybe you want to visualize the quantity ordered via a sankey. In that case, Size might be 10 or 20 (or whatever quantity was ordered). I realize now that the term, Size, is terrible. I should think about changing that.

           

          The Model stuff is all intended to do the data densification needed to draw the curves. I'd have to go back to determine why I varied from Olivier's approach, but understanding the details of how this works isn't really all that critical, I don't think.

           

          Hope that helps. If you'd like to share some data, I'd be happy to help you work backwards to find the best way to fit the data into the template.

           

          - Ken

          • 2. Re: Data format for Multi-Layered Sankey Diagrams
            chris reed

            Ken Flerlage, thank you so much for the reply! Your comments on Size make sense to a degree, but i'd like a little more to make sure i'm grasping it right. I've laid out a little scenario below along with some sample data that I was hoping you wouldn't mind working through with me.

             

            With this example lets imagine you and I own a business that sells guitar picks.  We've contracted several famous musicians to work as influencers for us and we would like to visualize our how our costs changed for each influencer across time.

             

            Here are the first  two lines of the attached example.

             

            LinkIDStep 1Step 2Step 3Step 4Size
            linkJohn Doe$100$150$300$300
            linkJane Doe$100$100$250$300

             

            As you mentioned above, Steps are our phases, so here lets imagine them as Quarters 1 - 4. So we payed John Doe $100 Q1, $150 Q2, and so on...

             

            In regard to Size, maybe if you could give an example of what kind of values we'd use in the scenario I laid out i'd be able to better understand.

             

            Again, super grateful for your help here!

            • 3. Re: Data format for Multi-Layered Sankey Diagrams
              Ken Flerlage

              This is not really how this chart works. What you're showing would make much more sense as a simple line or area chart. Sankeys show flow through through various "steps" or "phases". In this case, the value in the "Steps" is the status within that phase. A great example is higher education. Each record might be a student and each "Step" is a year of college, with the value of each step being that student's major at that time. In this case, your data would look like this with Size=1 since each record represents a single student.

               

                 

              LinkIDStep 1Step 2Step 3Step 4Size
              linkJohn DoeEngineeringComputer ScienceComputer ScienceComputer Science1
              linkJane DoeArt HistoryArt HistoryArt HistoryArt History1
              linkBilly JoelEngineeringComputer ScienceComputer ScienceComputer Science1
              linkFreddy MercuryMusicMusicMusic PerformanceMusic Performance1
              linkMichael JacksonMusicMusicMusic PerformanceMusic Performance1

               

              But, if you didn't care about being able to trace a specific student, you could aggregate this data.

               

                  

              LinkIDStep 1Step 2Step 3Step 4Size
              linkEng/CS/CS/CSEngineeringComputer ScienceComputer ScienceComputer Science2
              linkArt HistoryArt HistoryArt HistoryArt HistoryArt History1
              linkMusic/Music/MP/MPMusicMusicMusic PerformanceMusic Performance2

               

              In this case, Size represents the total number of students who took that path.

              1 of 1 people found this helpful
              • 4. Re: Data format for Multi-Layered Sankey Diagrams
                chris reed

                This explanation was perfect, I think this clarifies everything for me.

                 

                Thanks so much, time to get to actually building the viz now

                • 5. Re: Data format for Multi-Layered Sankey Diagrams
                  Ken Flerlage

                  Great!! Please be sure to share the result (if you can).

                   

                  If this has met your need, would you be so kind as to mark my response as the "correct answer"? This will close the thread and will cause that response to bubble up to the top of the post, making it easier for others to find the answer to similar questions in the future. Thanks!