4 Replies Latest reply on Apr 3, 2013 10:21 AM by Jim Wahl

    Display quartiles

    Ambi Nair


      I m trying to analyze the


      1) spread of Booking amount @ region and customer level by Quartiles.Have calculated the quartiles by reference table cal.

      However the quartiles details are not being displayed properly,leading to not able to visually see it.(View Sheet 3).

      1a) Tried to workaround by exporting the cross tab in excel,but Excel does not extract the quartiles figures.

      2) To analyze all those costumers who are outliers and are above the upper quartiles.(View dashboard 1)not sure how those customer can be pulled out.


      Please advice.




        • 1. Re: Display quartiles
          Jim Wahl

          Your reference calculation looks OK, but perhaps you want this calculated for each airline (in which case you'd use cell).


          In any case, the quartiles are there, but you can't see them because they are very narrow. For all of the airlines, the 1st and 3rd quartiles are 406 and 4325, respectively, but your axis range is from -100,000 to 800,000, so the IQR range is barely visible. Using a log scale for the x-axis is helpful when you have such a wide distribution.


          Still, I'm not sure what your goal is with this view, as there are a lot of outliers. Perhaps a histogram would be better?


          Here are the quartiles from R (one line of code and another case where R comes in handy when using Tableau) .

          Of the 39672 records, 4385 are outliers by the common definition of Q3 + 1.5*IQR and Q1- 1.5*IQR.


          > summary(booking_amount$Booking.Amount)

              Min.  1st Qu.   Median     Mean  3rd Qu.     Max.

          -55480.0    406.2   1493.0   5110.0   4325.0 757000.0


          > length(booking_amount$Booking.Amount)

          [1] 39672

          > sum(booking_amount$Booking.Amount > 4325+1.5*(4325-406.2))

          [1] 4383

          > sum(booking_amount$Booking.Amount < 406.2-1.5*(4325-406.2))

          [1] 2



          • 2. Re: Display quartiles
            Ambi Nair

            hello Jim.


            Idea is to identify all those outliers @ region level and remove them, as these customers are supposed to be Small businesses and making such a huge bookings for them is not possible.


            So generally want to know at region level what is the distribution of bookings and customers having Bookings amount more than upper quartile should be removed i.e as they are perceived to be Outlier.

            not sure who to use histogram for this.


            Since we don't use R,tableau remains our only option to figure this out.


            Please advice.




            • 3. Re: Display quartiles
              Jim Wahl

              I thought a histogram might help you better see the distribution, and perhaps choose a better cutoff for what's considered an outlier.


              I'm caught up in something at the moment, but if you want to calculate Q3 for each partition and then filter the data, using a calculated field, a good place to start is Richard Leeke's quantile example here:



              If this is your first experience with table calcs, you'll be jumping in the deep end. If you only have a handful of regions and corresponding calculating outlier thresholds, you might consider doing this manually.


              Let me know how far you get with Leeke's formula and I'll check in later tonight. ...



              • 4. Re: Display quartiles
                Jim Wahl

                Hi Ambi,


                Check out the attached workbook, which includes a "Sheet 3 filtered" that excludes values > 75th percentile.


                One trap with Tableau's reference lines is that measures on a scatter plot view need to be disaggregated for calculations like the quartile reference line to work properly. From the menu, select Analysis > uncheck Aggregate Measures.


                When you do this, you'll see, for example, the number of points on your Sheet 3 increase from ~90K to ~200K. When measures are aggregated, identical values are only counted once. This, of course, causes problems for median and quartile. ...