11 Replies Latest reply on Dec 19, 2014 1:07 PM by Keith Helfrich

    Why the Long Sankey ?

    Keith Helfrich

      Dear Tableau Community,


      Similar to the way we found it possible (with some creative savvy) to find the INTERSECTION of multi-select quick filter values, I'm back again with a new brain teaser that revolves around the competitive sport of Bocce Ball.  I'm hoping you can help!


      Admittedly, the title of this post should be How the Long Sankey ?  But that name wasn't nearly as catchy as asking "why"


      Please, see the attatched .twbx where I've story-boarded the challenge.  In short, I'm searching for help with how to build Jeffrey Shaffer's Sankey diagrams from a LONG format data source.


      Any & all ideas are welcome.


      Extra points if you can fill in the punchline to the joke that starts: A horse walks into a bar & the bartender asks, "why the long Sankey?"



      Keith Helfrich | LinkedIn

      Keith Helfrich | Twitter

        • 1. Re: Why the Long Sankey ?
          Jeffrey Shaffer

          On "The Challenge" sheet, if you simply remove "Series" and "Rank" from the Row, don't you get what you are looking for?

          • 2. Re: Why the Long Sankey ?
            Keith Helfrich

            Hi Jeffrey,


            Yes, indeed I do, sometimes accident is the mother of great design!


            I'd only just put that "The Challenge" sheet together when building the explanation using fake data.  So it wasn't an angle I had explored, really.  It's nice how the measure values / measure names accomplish what I'm searching for.


            Let me try to build the Sankey graphs from there & see how it goes.



            Keith Helfrich | LinkedIn

            Keith Helfrich | Twitter

            • 3. Re: Why the Long Sankey ?
              Keith Helfrich

              So I've accomplished what I was looking for.


              In the updated .twbx, attached, I've done my best to walk thru the steps in a "Solution" story-board.  You'll find on the last page the ability to choose at runtime which two "series" should be use for the comparison via parameters.


              An Observation:


              This is a beautiful example of Joe's emphasis on the need for understand 'The Four Pill Types'. I had been treating [Rank] as a dimension because, in the traditional definition, it is one! But Tableau doesn't care about the traditional definition of Dimension & Measure. In tableau: any field can be either a measure or a dimension (or both!). And you need to use the definition that suits your fancy!



              A Question:


              My "Solution" story might be a bit difficult to follow, but at the bottom of the Empty the ROWS & COLUMNS story point, I've made the curious observation:

                 - after all this effort to "widen" the data, Tableau is still storing it in a TALL format

                 - so, shouldn't there be an easier way to get there, without going thru measure names & back ?


              Something to Keep you Up at Night:


              ... so a horse walks into a bar & the bartender asks ...





              Keith Helfrich | LinkedIn

              Keith Helfrich | Twitter

              • 4. Re: Why the Long Sankey ?
                Olivier CATHERIN

                Here is an exemple of a polygone Sankey Digram based on Jeff's work : https://public.tableausoftware.com/shared/NQM7HZX4S?:display_count=no

                Using polygones makes it much more efficient and responsive.

                • 5. Re: Why the Long Sankey ?
                  Keith Helfrich

                  Thank you, Oliver.  This is quite attractive and also very responsive. 


                  Are you perhaps able to point to a blog post where the polygons approach is described in detail ?



                  Keith Helfrich | LinkedIn

                  Keith Helfrich | Twitter

                  • 6. Re: Why the Long Sankey ?
                    Olivier CATHERIN

                    Hi Keith,


                    I will try to write a full post regarding the method. While waiting, here is a briefing of the approach.

                    My starting point is the work of Jeff that you can find on the following link : Data + Science

                    It describes the way to build a spaghetti graph looking like a sankey. It is based on a dataset containing all the required data in order to rank the lines of the graph.


                    My first idea was to build the same dynamically not using any additional hard coded data such as ID, ranking or anything else. Just Step 1 and Step 2 data. The challenge is to define the position of each line based on a dynamic ranking (RANK_UNIQUE) and Table calculation.

                    From this point, if one can draw all the lines of the flow, I thought it may be possible to define only the TOP and the BOTTOM line of each flow. And if one can join the points altogether, then this would allow to draw a polygone.


                    So I took the model from Jeff and added the following data in excel :

                    • - Way : MIN
                    • - Path : from 0 (T=-6) to 48 (T=6)

                    And doubled the lines with

                    • - Way : MAX
                    • - Path : from 97 (T=-6) to 49 (T=6)


                    In Tableau, I can now generate the MIN and the MAX lines for Position 1 and 2

                    • Max Position 1 : RUNNING_SUM(Size) Calculated with Step 1, Step 2 (Advanced table calculation option in the formula window)
                    • Min Position 1 : RUNNING_SUM(Size) - (Size) Calculated with Step 2, Step 1 (Advanced table calculation option in the formula window)

                    Size being whatever I use to define the size of the flow

                    You can repeat the operation for Position 2 (Min and Max)


                    Using Jeff's method, you can draw a Min Curve (with Min Position 1 and Min Position 2) and a Max Curve (with Max Position 1 and Max Position 2).


                    The polygone will be drawn the following way :

                    CASE ATTR(way)

                    WHEN 'Min' THEN Curve Min

                    WHEN 'Max' THEN Curve Max



                    When building the graph, you just have to turn the graph type into polygones and place the path pill on 'Path'.


                    Hope this helps !

                    • 7. Re: Why the Long Sankey ?
                      Keith Helfrich

                      That does help, Oliver.  Thanks!


                      I had seen each of those things when I took apart the workbook, but it's a lot easier when somebody explains it to you. Otherwise, it's more work to deduce how the pieces fit together.  Avoiding the Google search for words in French that mean 'path' in English also helps!


                      This is really very nice!  I love how beautiful the lines are, how fast it renders, and how it's no longer necessary to build in the ID values.


                      Please let us know when a full post is written, I'm certain you'll make the Best of Web from Tableau =)



                      Keith Helfrich | LinkedIn

                      Keith Helfrich | Twitter

                      • 8. Re: Why the Long Sankey ?
                        Olivier CATHERIN

                        Thx a lot. I'm working on it... Will post the link tomorrow !



                        • 9. Re: Why the Long Sankey ?
                          Olivier CATHERIN

                          Here it is : a full description of the approach...



                          Feel free to give your feedbacks and improvement ideas !





                          • 10. Re: Why the Long Sankey ?
                            Shawn Wallwork

                            Excellent work Olivier!



                            • 11. Re: Why the Long Sankey ?
                              Keith Helfrich

                              Hi Oliver,


                              I'm just now getting around to taking a look at what you've done.  It is quite nice, but I'm having trouble making the bridge from your work to my own.


                              Your example blog posts describe how to build a pair of TOP N parameters, and the data format that you're working with is also bit different from mine.


                              • Your data comes in the wide format | Date | ID | Step 1 | Step 2 | Step 3 | Size |
                              • My data comes in the tall format | League | Series | Team ID | Team Name | Rank | Score |


                              So, in other words, I have a single Rank column which gives the rank for each team in every Series of every League.


                              As seen in the comments above, I could get Jeff's original Sankey model to work correctly on this data by using Measure Names and Measure Values.  But it would be really nice to use your polygon approach for this data, as well.


                              Is there a way that you could help me to understand the best way to get it done ?


                              I'm attaching the raw data here, and also the original workbook which explains what I want to do.



                              Keith Helfrich | Twitter

                              Red Headed Step Data