    color coding the box plots

    Young Song



          I am currently using Tableau 8.3 on Windows 7.  I am not sure whether I am asking for something that isn't in this version of Tableau, but I was wondering if there is a way of colour coding the actual box portion in the box and whisker plots.  So far it seems like I can colour the boxes individually under "Edit Reference Line, Band or Box", but I was hoping to maybe colour-code the boxes by some fields.


         The main reason I am asking this question is because sometimes the circle marks under the box and whisker can get very cluttered and messy if there are lots of data.  So what I end up doing is hide the marks under the boxes except the outliers.  In cases where there are ourliers, I can at least see the colours on the outliers and know to which category or the entity the box plot data belongs to.  In cases where there are no outliers, there is bit of a trouble interpreting the data.


         So far, what I have done is to show all the data under the box and whisker, but reduce the size of the marks to at least show the colour codes, but then to emphasize more on the box and whisker portion of the view.


         Any suggestions or feedbacks here would be much appreciated.





          Young Song

          Hi, thank you for the quick response.  What I am looking for is actually some sort of capacity, where I drag a field into the colour scheme and then the box would change the colours according to the colour legend.  It's kind of like asking the software to colour the boxes instead of the marks (since I want just the box and whisker, and not the marks except for the outliers).


          You could do something like this in R (as shown below), but I would really prefer to stick to the Tableau route if possible.



            Jonathan Drummey

            I've got a partial solution for you.


            This is being posted as a brain teaser at Educational Brain Teaser: Coloring the Boxes on Box and Whisker Plots, if you don't want to know the solution don't read any further!


            Screen Shot 2015-04-18 at 7.46.34 AM.PNG


            Seriously, spoilers come right after this...


            The challenge here is that Tableau generates the box plot based on the values of the continuous pill on the axis. We're not able to choose the pill used to generate the values like we can with most other reference lines & bands, and there is only a single box plot allowed per axis.


            Therefore, the workaround is to generate more axes. In this case, using Superstore Sales I set up two measures that do record level calculations, the Furniture one is SUM(IF [Department] == "Furniture" THEN [Sales] END), then put them together in a dual axis chart using Measure Names on Columns. The rest is formatting, here's the more full view:


            Screen Shot 2015-04-18 at 7.47.08 AM.PNG


            The limit of this solution is that it can only handle 2 distinct discrete values. To get more than 2, there are three alternatives I can think of, Shawn already laid out two of them:


            1) Create a worksheet for each distinct discrete value then put them all in a dashboard, turning off Show Headers for the Y axis for all but the left-most worksheet.


            2) Build 5 calculations for each distinct discrete value (whisker ends, 25/50/75the %iles) that only return non-Null values for that discrete, then put them all on the LOD and build reference bands & lines using the BYO pre-8.0 box plot technique.


            3) Add 20 records to the data source for each distinct discrete value and then use those marks to draw the box plots as polygons on a dual axis. Given some work in the data source to automatically add those records (and a whole lot of calculations), this is the only solution that I can think of that would be able to handle an arbitrary number of discrete values.



              Matt Lutton

              This is great, and perfect timing for me.

                Young Song

                Hi Jonathan,


                     Thank you for the detailed explanations.  That certainly helps me solve at least some part of my problem.  What I forgot to mention previously is that I am interested doing some time-series visualization.  So basically, for each discrete period of time (I am working with months), there are box plots with colour-codes.  I have attached a workbook here to give you an idea of what I am aiming for.  What I noticed is that the dates first of all, make things look bit redundant.  This problem would be solved if I move the right-most sheet to the bottom and turn the header off.  However, things could get little bit more messy, as the data that our organization is trying to visualize if bit more complex (i.e. longer date range, which will be filtered to the recent three months, and lot more categories for colour-coding).  Hence, you could imagine several worksheets piled up on top of each other.


                     By the way, I was reading your third option and wasn't really sure what you had meant by "add 20 records to the data source".   Is this assuming that I am working with Excel Spreadsheet or some other text-based data?

                  Jonathan Drummey

                  When we're doing things like drawing our own box plots, we're taking control over Tableau's rendering process, essentially treating Tableau as a rendering engine by feeding it data with the right structure and values. We can do some amazing things with that approach, my friend Noah Salvaterra has done some unbelievable work, this is one of my favorites: Building Life in Tableau – by Noah Salvaterra | Drawing with Numbers


                  When I wrote "add 20 records to the data source", I meant the Tableau data source. In my case, I'm typically using Access or SQL Server and to do padding like this and what I'll do is use a separate table and/or a cross product query to generate the extra records that gets UNION'ed with the original data. If you have an ETL tool like Alteryx then you can get similar results. In the view you sent, for 3 months * 20 categories (a guess at "lots more") * 20 records per extra that's 1200 extra records, which will lead to 1200 extra marks in the view. That's very do-able for Tableau, it's when you get into the 10s of thousands of marks that you may start seeing the rendering times increase. One way I speed that up is to do as much pre-filtering and pre-aggregation as possible so there's less run-time computations for Tableau and the data source.


                    Nicole Edmonds

                    I started this as the brain teaser and my first thought was "a-ha!" dual axis, which I generated similarly to above. That said, I'm still stumped on the formatting - even if I duplicate as above, my boxes are still grey. 

                      Jonathan Drummey

                      Hi Nicole,


                      You need to individually format the box plot for each axis to set the Fill color:


                        Nicole Edmonds

                        Ah, of course!  Thanks! 

                          Young Song

                          I guess this could go into Ideas section, but it would be super useful to add the colour coding functionality to the box plots method.  The version 8.3 that I am currently using provides a shortcut method of adding the box plot (i.e. click on the box and whisker icon on the "Show Me" tab).  Perhaps they could add some function such as "Colour Code by <Field>" on the reference line editing panel.

                            Yuriy Fal

                            Hi all,


                            I could make it another way, via dual-axes & gantts.

                            So i did and responded to the Jonathan brain teaser earlier.


                            Posting my (partial) solution here, just for the reference.

                            Please find the attached wb. Hope it could help.