1 2 3 Previous Next 43 Replies Latest reply on Nov 20, 2018 12:57 PM by Divya Chhabra

    Grouping Bins on a Histogram

    Ben Young

      Hi everyone,

       

      I was helping out a friend earlier this week and, in the process of helping him build the visualization he wanted, I was able to create a workaround for a problem I have run into before that I thought would be worth sharing. The problem this workaround solves is grouping bins on a histogram.

       

      Tableau has the awesome capability to build bins and instantly create a histogram. However, it comes with a couple limitations. First, you cannot group the bins. As far as I understand it, and I may be completely off on this, the issue has to do with how Tableau creates the bins as a new Dimension; the bin itself almost acts as a the group rather than allowing a group functionality. (Side note: feel free to add comments and improve on this explanation of why grouping doesn't work. I am not a technical expert on the inner workings of Tableau). You can create sets, but the lack of being able to highlight and group has been a frustration where I work since it is so easy to do in other situations. A second limitation is that you cannot use the bins in calculations. Again, you can use sets, but if this can cause issues if you want to show one level of detail on those values in the set and one level on those values not in the set.

       

      As a simple example, here is the histogram of Sales from the Sample - Superstore data set in Tableau 9.0, with bins sized at $100.Sales Histogram.jpg

      This is not the best of views for many reasons, but one of the main ones is the long tail to the right needed to capture the different sales amounts. Normally, in a view with marks rather than bins, it would be possible to highlight all the values to the right and group them to create a view that showed the values up to a certain value, then have a bin that includes everything greater than or equal to that value. However, because you can't group bins, a workaround is necessary. I'll do my best to document each step to show how I solved this issue. I'll also attach the workbook so people can play around with the viz and make improvements. I'm sure people will come up with much better and more elegant ways to do this. If you do, please let me know so I can learn from you as well!

       

      The first step in the process is to create a set for those values you want to group. In the Sample - Superstore data set, I chose Sales as the field I wanted to use. The trick is, as a continuous measure, you cannot create a set. Sets are only created off of Dimensions. So, I duplicated the Sales measure and converted the duplicate to a Dimension, renaming it Sales - Dimension in the process.

       

      Before creating the set off of this new field, I also created a parameter called "Group everything above..." In my experience, I have found it helpful to create a viz that offers flexibility from the beginning rather than trying to add it in later. With the parameter, the user will be able to choose what level they want to group at, making a much more flexible view.

       

      Back to creating the set, I used the new Sales - Dimension field to create my set. Instead of hard-coding values though, I created a condition using the following forumla: [Sales - Dimension] >= [Group everything above...]. This will dynamically update the members of the set based off of the parameter value the user chooses. In the attached workbook, I named this set "Fields to Group". Not the most exciting name, but descriptive nonetheless. Capture.JPG

       

      After creating the set, I needed to create the bins. However, looking forward, I also will need to be able to use these bins in a calculation, so I had to not use Tableau's built-in binning capability. To manually create the bins, I used the following formula: INT([Sales - Dimension]/[Bin Size])*[Bin Size]. In this case, Bin Size is another parameter, again, created to provide flexibility with the final view. The calculation works by using the INT function to truncate the decimal point off of the returned value (int(4.1) = 4; int(4.6) = 4). Then, multiplying by the Bin Size again will create the relative bins by returning values that will all match due to the truncation. I called this field Bins.

       

      For example, a data set of 4, 5, and 15 sized at bin size 10 will calculate as follows: int(4/10) * 10 = int(.4) * 10 = 0*10 = 0; int(5/10)*10 = int(.5)*10 = 0*10 = 0; int(15/10) = int(1.5) * 10 = 1*10 = 10. These will return a value of 0, 0, and 10, allowing you to use the discrete values returned by the calculation as your own bins. A quick test by dragging Bins to the Columns shelf, changing it to discrete, and placing CNT(Sales) on the Rows shelf will recreate a the view that Tableau would create if you created a histogram using Tableau's built in capabilities.

       

      Once you have your bins created and the set made, the last step is to create a calculated field that will act as your bins, showing both the individual bins below your group, as well as the group itself. I built this in a field called Custom Histogram that looks like so:

       

      if [Fields to Group] = FALSE then STR([Bins]) else STR([Group everything above...])+'+' end

       

      The If first looks to the set you created to see if it is False. This will return every bin that you want to see broken out, not grouped. For each of these bins, all it does is return the value of that bin, cast as a string. The else statement is a dynamic label that takes whatever cutoff the user put in using the parameter Group everything above... and concatenates it with a +. This allows for a slightly more descriptive label of 650+, 750+, etc. instead of just the minimum bin in your group. It also is the reason you have to cast both Bins and Group everything above... as strings to make it possible for you to concatenate it with a string value.

       

      Once you have that calculation built, drag it onto the Columns Shelf, add CNT(Sales) to your Rows Shelf, show the parameter controls for Bin Size and Group everything above...and you're good to go! Here is what the final view looks like:

      Custom Histogram.jpg

       

      I'm sure I've just scratched the surface of possibilities like this. Like I said before, feel free to use this technique and improve on it, just let me know what you find out so we can all keep getting better at exploring our data.

       

      Hope this helps!

       

      --Ben

       

      EDIT Nov 17, 2017: I expanded on this technique to include how to handle negative values as well: https://www.interworks.com/blog/byoung/2017/11/17/how-create-bins-histogram-tableau

        • 1. Re: Grouping Bins on a Histogram
          Alexander Mou

          This is great!

          Thanks for sharing.

          • 2. Re: Grouping Bins on a Histogram
            Simon Runc

            Ben this is really cool...this is a very elegant solution...I'll certainly be implementing it in place of my current (pretty in-elegant!!) solution, in our models. As you say I think this technique could be extended to several situations, especially with the flexibility of changing your N+ limit on the fly (I'm thinking, off the top of my head, when you want to limmit the values at the extremes of a scatter plot, if you are interested in the variance where  most marks reside...when there are real extreme values, they are of little interest (in some situations) but make the scale so large that there is little 'visual' variance in the center...just a thought!).Thanks for sharing

            • 3. Re: Grouping Bins on a Histogram
              Ben Young

              Hi Simon,

               

              Glad it helps out. I hadn't thought about the application in anything besides a histogram. The idea of hiding extremes on a scatter plot is intriguing; if you do manage to implement that, I'd love to see it. I also might shamelessly borrow your idea and see if I can apply it.

               

              I have also been meaning to test out a minimum grouping as well, allowing a very flexible view that allows you to bookend the two extremes of the data. This would translate very well to a scatter plot... Hmm...

               

              If you do end up using this, and your data isn't confidential, I'd love to see the outcome so I can learn from your techniques.

               

              --Ben

              1 of 1 people found this helpful
              • 4. Re: Grouping Bins on a Histogram
                Simon Runc

                thinking about it a bit more the Scatter plot (limit value) isn't too bad, as we have the values detailed out and we just want a single limit so nothing gets plotted above X. So we only need a single IF statement...I was trying to think of situations where I want to 'limit' values (but might want quite a few 'cuts' of data below this limit).

                 

                I have however combined the 2 tricks (your Histogram one, and the Limit Scatter plot values) in the attached

                 

                Scatter Plot Trunc.PNG

                 

                both the Scatter Plot 'limit' and the 'Histogram +' are controlled by the same parameter, and I've have combined the 2 tricks to colour the points in the Scatter Plot with their 'bin' in the Histogram (which is actually quite nice!). The Scatter plot limit, is really just a way of controlling the Axis range dynamically (you may also like this one for controlling them the other way! Setting Uniform Axis including Grand Totals). You do need to be a bit careful with the 'Limit' technique on the Box and Whiskers and other Reference Lines. I'm generally more interested in IQR (and Median) than average (...Love the anecdote about the statistician who drowned crossing a river as the Average depth was only 1 meter!!), so as long as my limit is higher than the IQR (generally rather than use a parameter I create a calculation on say 10% above the IQR as this makes it more dynamic) all will be well. For average though, in the attached you can see I've had to calculate this from my original (non-limited) value.

                 

                I'll definitely post back here with any other uses/extensions to this I come across.

                2 of 2 people found this helpful
                • 5. Re: Grouping Bins on a Histogram
                  Ben Young

                  This is fantastic. First off, just a great looking workbook. I realize it's just a mock-up using Superstore, but the style and design are very nice. Well done. I'd never seen the index()% combination to spread out the scattered data.

                   

                  The scatter plot looks great. Tying both graphs to the same parameters, as well as the actions selecting across views, work very well. The Setting Uniform Axis trick is also fantastic, as is the anecdote about the statistician

                   

                  Thanks so much for sharing!

                   

                  --Ben

                  • 6. Re: Grouping Bins on a Histogram
                    Simon Runc

                    Thanks Ben, you are very kind (Yes, I find the combination of Segoe UI, Light Grey boarders around bars/circles, medium/light color pallet, and bordering containers does give a nice look)...Yes I love the Index Jitter (as it's known)...I’ve Got the Jitters (and I Like it!) » Data Revelations (as always taken from someone far smarter than myself!!!). I'm generally far more interested in variance, and 'typicality' than averages (if you've read Christian Rudder's 'Dataclysm', there's a very interesting chapter on people's 'average' attractiveness, vs their variance of attractiveness' and the effect this has on how many time they are 'messaged'). My boss has also written a great article on the subject Managing by Averages… of Averages | The Data Animators

                     

                    I did originally 'shoe-horn' scatter/histogram in to show you the technique we use to 'control' the axis, but after playing with it I think it's got legs!...I'm going to try and get the histogram to 'brush' on selection of a region, so you can see the usual spread vs. spread for that region.

                     

                    This is why I love the community!...get so many good ideas (both deliberate and happy-accidents). I'll keep you posted on my progress (might be a few days as I've got to do some actual work for a bit...boo!!!)

                    • 7. Re: Grouping Bins on a Histogram
                      Alexander Mou

                      Simon,

                      Great play with Scatter plots and Box and Whiskers!

                       

                      BTW, Top N and Others has been a technique being around for a while. It's nice to see it applied to histogram bins.

                      Create Sets for Top N and Others | Tableau Software

                      • 8. Re: Grouping Bins on a Histogram
                        Bruce Segal

                        Simon:

                         

                        On jittering the scatter plot I've used Index() alone w/o the modulo %25. And then I have it restart in each new pane. In most instances I can get the jittered marks fairly evenly distributed across each pane. Can you explain why you used Index()%25, instead of just Index() on the Order Id?

                         

                        Am I correct that the modulo 25, divides the OrderId by 25 and then rounds to the nearest integer?

                         

                        I played around with other variants of Index()%25,like Index()%10, Index()%5, and Index()%50.The first two pretty much distributed the points in even columns, the last one loaded them to one side. See images below.

                         

                         

                         

                        I can get the jitters much more evenly distributed across the panes, even as you change the ATV Trunc parameter. Here's what it looks like when I change the modulo to %20 and then change the compute level, addressing, and sort settings. I may not need to tweak all of these to achieve the result, but I'm still learning how all these elements work. I attach the .twbx file with the additional work sheet. It's in ver 9.1 format.

                         

                        1 of 1 people found this helpful
                        • 9. Re: Grouping Bins on a Histogram
                          Simon Runc

                          hi Bruce,

                           

                          Excellent question...and good work on the 'investigation'!

                           

                          So the Mod function returns the remainder of a division. So 7 MOD 3 = 1 (7/3 = 2 and remainder 1). So by using the MOD with the index we get the following sequence for MOD 25

                           

                          Mod Table.PNG

                          So 1/25 goes zero times, and so the remainder (aka MOD) = 1

                          2/25 goes zero times, and so the remainder = 2

                          ...

                          25/25 goes once, and the remainder = 0

                          26/25 goes once, and the remainder = 1

                           

                          As you can see this has the affect of returning values 1-25 and then starting again. As we don't have our marks sorted, this gives the 'illusion' of randomness!

                           

                          I've used 25 here, just through experience and the number I choose here depends on the number of marks I have to 'jitter'. Pre-Tableau 9.0, I'd create this as a calculated field, with the divisor (25 in this case) on a parameter so I can adjust until I get the correct look. With 'type-in' functions in Tableau 9.0 is easy enough to just alter this number.

                           

                          The reason only using Index() in this case works, is that we have the 'bin' dimension within the 'pane' (VizLoD) so your index() number get's restarted for each 'bin'. If we didn't have this your index() would just go from 1 to number of marks, per pane and wouldn't look random. Below is a chart showing the index() for 100 marks, and what their MOD would be using MOD 25 and MOD 50

                           

                          Mod Chart.PNG

                           

                          I think, your index()%50 is restarting for every bin (I had to use the advanced option to get mine set up correctly with this extra 'bin' dimension in the view), or might be sorted. This means, looking at the chart above, not enough points in the 'higher' parts of the scatter, so the MOD never get's to 50, and so all points are plotted on the left hand side. When the MOD divisor is too low, as you've shown the pattern doesn't look random.

                           

                          As a general rule of thumb, I find MOD 25 the best for most situations. Generally with Scatter plots I'm looking at around 100-500 marks per pane (and less and I might be able to show on a bar chart, any more and I've got too many marks to make a meaningful Viz...btw this isn't a 'hard-and-fast' rule!) and find Mod 25 works well, but this has been from trial and error (very much in the way you have done below).

                           

                          Thanks for the question. I hadn't really thought about it in this much depth before, and glad I have!!

                          • 10. Re: Grouping Bins on a Histogram
                            Rody Zakovich

                            This is awesome! Great work Ben.

                             

                            I used a similar trick for dynamic bins here  Human Readable Row Level Dynamic Bins

                             

                            This is def something I want to implement in some of my workbook.

                             

                            Thanks again for sharing!

                             

                            Rody

                            • 11. Re: Grouping Bins on a Histogram
                              Jerry Wetherall

                              This is really nice work, Ben. Thanks for the contribution. I can definitely use this solution.

                              • 13. Re: Grouping Bins on a Histogram
                                Simon Runc

                                Thanks Alexander

                                 

                                ...I also found this the other day, which really extends the histogram scatter plot idea...I think this is amazing!

                                 

                                How to Make a Scatterplot with Marginal Histograms in Tableau | DataRemixed

                                • 14. Re: Grouping Bins on a Histogram
                                  Ben Young

                                  Thanks Alexander! Glad it helped out.

                                  1 2 3 Previous Next