3 Replies Latest reply on Aug 28, 2018 2:15 AM by Leanne Vermeulen

    Sankey Chart with Top N

    Leanne Vermeulen

      So, thanks to this awesome and recent post by Ian Baldwin, I was able to recreate a useful Sankey chart:



      I have two small kinks I need to work through, though:


      1. I'd like to limit the "Second Dimension" field to only show the Top N of the dimension chosen. I can't seem to get it right, particularly because of the limitations of getting members of a set inside a calculated field.
      2. I'd like to base the Sankey chart on the distinct count of invoice IDs (which in the case of my dataset is the number of times Dimension 1 was associated with Dimension 2), but as I need COUNTD([Id]) to get that to work, I run into a hurdle with the "Path Frame" calculation:

      IF [Measure] = { FIXED : MIN([Measure])} THEN 0 ELSE 97 END.

      My version would be: IF COUNTD([Id]) = MIN({ FIXED : COUNTD([Id])}) THEN 0 ELSE 97 END but then I can't create the bins I need. I read somewhere that FIXED LoDs can help with this; but I'm still struggling to get it right.


      Maybe I just need a step back and am missing something obvious, but any help would be appreciated. Thanks!

        • 1. Re: Sankey Chart with Top N
          Patrick Van Der Hyde

          Hello Leanne,


          I looked at this post and have been working with the workbook supplied (Thank you very much) and the limitation of top n is typically controlled via a table calculation and this is late in the order of operations after the evaluation of context filters which is where I think this needs to be to work.  I think it's worth calling out Ian Baldwin again here and to at least see if Jonathan Drummey has any thoughts on this one.  I know better than to ever say "not possible" in Tableau but I can not find a working solution in the time I have played with your workbook.


          How have you progressed since this original post? 



          • 2. Re: Sankey Chart with Top N
            Jonathan Drummey

            Hi Leanne,


            I took a quick look at this (ok, a longer look) and figured out the problem.


            Here's a view where I've taken out the bin and adjusted the Sankey Arm Size compute using to show the expected size for each combination of the dimensions:


            Screen Shot 2018-08-21 at 9.42.53 PM.png


            Now in this view I've added the bin in, along with some commentary:


            Screen Shot 2018-08-21 at 9.53.54 PM.png



            Both of these issues are due to a cascading set of factors:

            - The data source is built on a kind of self-join, it looks like to do some sort of market basket analysis?

            - The way the Dimension 1 and Dimension 2 variables are built (one from the left side of the join, the other from the right) effectively creates a sparse data set.

            - The binning using the FIXED LOD in the Path Frame field depends on a complete domain of data (the Revenue measure in this case).

            - With sparse data the Revenue values that are binned don't necessarily have enough values to create all the necessary bins (The Female/Fruit & Veg problem).

            - In addition due to the sparseness the chosen measure changes values based on the bin (the Female/Meat/0 problem).


            Given sparse data the FIXED LOD method to create bins is fundamentally unreliable (Also the fact that the final view needs to add a filter just adds to the sparseness). The workaround is to use an alternative method to build the source where you'd create a data source with two copies of the data to create the Path Frame. Given the complexity of the existing source the way I'd do that is with a cross join on a 2 row data source, see Cross Data Source Joins + Join on Calcs = UNION Almost Anything in Tableau v10.2 for details on how to do that.


            Note that you'd have to adjust any additive measures to deal with the multiple copies of the data, given that you want to use COUNTD(Id) as the measure here that wouldn't be the problem. However for the Sankey Arm Size and the % of Totals for the bars the measure would need to be something like COUNTD([Id]) / WINDOW_SUM(COUNTD([Id]), potentially with additional (prior) calculations to handle replicated values due to the two copies of the data.



            3 of 3 people found this helpful
            • 3. Re: Sankey Chart with Top N
              Leanne Vermeulen

              Hi guys, Jonathan Drummey and Patrick Van Der Hyde thanks so much for the help! I'd given up on this post and took a chance logging in today... oddly enough I never received any notifications of your responses at all, not even in my Tableau message inbox. Not even sure how your message got marked as correct, but saved me the trouble! Thanks again - I was starting to think this wasn't possible.