12 Replies Latest reply on Aug 1, 2016 8:16 AM by swaroop.gantela

    How to get the top k tail outliers in a box plot ?

    ALIABBAS PETIWALA

      As circled in the below box plot in need to get the top k extreme elements from both the tails of each box plot in the figure below , how is it possible to do so?

       

      The X axis consists of categories and for each category I need to tabulate or visualize the top k  tail outliers like:

               

      CATEGORYUPPER TAILLOWER TAIL
      CAT 15891,5682....125,196
      CAT 2601,...246,...

       

      sublabelwisttmt.png

        • 1. Re: How to get the top k tail outliers in a box plot ?
          swaroop.gantela

          Aliabbas,

           

          Please see if the attached could be a starting point for you.

           

          In the attached, my metric was [Profit].

          So first I ranked [Profit]:

          RANK(ATTR([Profit]))

          with a "Compute using" of "Cell"

           

          Then I got the MaxRank:

          WINDOW_MAX([RankProfit])

           

           

          With SelectK being the parameter, the OutlierUpper was:

          [RankProfit]<=[SelectK]

          and OutlierLower was:

          [RankProfit]>=([MaxRank]-[SelectK]+1)

           

          Values can be returned as such:

          IF [OutlierLower] THEN ATTR([Profit]) END

          211935outlier.png

          2 of 2 people found this helpful
          • 2. Re: How to get the top k tail outliers in a box plot ?
            Ashish Chaudhari

            Hi Swaroop,

             

            You write fabulous calcs. I am great fan of your calcs. May I know from how many years you are using tableau?

            You are really doing fabulous job. Do you have blog or articles written by you? i would love to read them.

             

            -Ashish

            • 3. Re: How to get the top k tail outliers in a box plot ?
              ALIABBAS PETIWALA

              Your profit field in my calc is Avgerrperjobseq which is an aggregation :

              so following does not work

              RANK(ATTR([Avgerrperjobseq]))

               

               

              however this does work, but does not give the desired result and ranks everything as 1:

               

               

              RANK([Avgerrperjobseq])

               

               

               

               

              where [Avgerrperjobseq]=[Number of Records]/COUNTD([Job Seq Id])

               

               

              The [Avgerrperjobseq] is on the Y axis like your profit field.

               

               

              Also I have to display the outlier which are in other dimension "loginid"  and NOT Avgerrperjobseq which I am able to get in the sheet 1 USING compute along "loginid" , But not in tabular form,

               

              I wanted a string concatenation of the "loginids" like

              IF [OutlierLower] THEN loginid  END

               

              but above does not work, Although I am able to get the correct visualization box plot with correctly colored outliers.

              • 4. Re: How to get the top k tail outliers in a box plot ?
                swaroop.gantela

                Aliabbas,

                 

                If all the Ranks are coming up as 1, it is likely that there needs to

                be made an adjustment to the "Compute using" for that pill.

                 

                The calculated field for displaying the loginid should work as you have written.

                It will likely also require adjustment of the "Compute using".

                Additionally, the RANK([Averrperjobseq]) will probably also need to be

                on the detail shelf with adjustment to its "Compute using".

                 

                Some next steps that could be helpful would be:

                -if your data is sensitive, post some screenshots the layout of your worksheet,

                  including the location of the pills on their shelves  and the "Compute using" for the table calculations

                -though I don't think your data is that much different in quality from the SuperStore data,

                you could post a small sample of your de-identified data

                -recreate your view on the de-identified sample set and post the twbx

                • 5. Re: How to get the top k tail outliers in a box plot ?
                  ALIABBAS PETIWALA

                  Thanks,

                   

                  Although I was able to get the box plot correctly, but not the table

                  outlier.png

                  compute.png

                   

                  for outlierlowervalues and uppervalues I tried two methods , both did not work:

                   

                   

                  1. IF [Outlierlower] THEN RAWSQLAGG_STR( "GROUP_CONCAT(%1)",[Login Id] ) END
                  2. IF [Outlierlower] THEN ATTR([Login Id] ) END

                   

                  THE outlierlowervalues /uppervalues dont even appear in measure names as shown below:

                   

                  outltabl.png

                  Below is the data source schema

                  schema.png

                   

                  So I have to display the top k tail outliers loginid for each  sublabel in a table.

                   

                  Here is the description of the sheet generated by Tableau:

                  Description of "Sheet 18"

                  Avgerrperjobseq for each Sublabel. Color shows details about Outliercolor. The marks are labeled by Login Id. The data is filtered on Firstmt and Mainlabel. The Firstmt filter keeps 56 of 319 members. The Mainlabel filter excludes ..., The view is filtered on Sublabel, which keeps no members.

                  Marks

                   

                  The mark type is Circle.

                  The marks are labeled by Login Id.

                  Stacked marks is off.

                  Shelves

                   

                  Rows:

                  Avgerrperjobseq

                  Columns:

                  Sublabel

                  Filters:

                  Sublabel, Firstmt, Mainlabel

                  Level of detail:

                  Rankedavgerrperseqid, Maxrank

                  Text:

                  Login Id

                  Color:

                  Outliercolor

                  Dimensions

                  Login Id has 56 members on this sheet

                  Members: A,B,C; ...

                  Firstmt has 56 members on this sheet

                  Members: 1...

                  Mainlabel has 6 members on this sheet

                  Members: ; ...

                  Sublabel has 28 members on this sheet

                  Members:  ...

                  Sublabel is sorted descending by distinct count of Job Seq Id.

                  Measures

                  Outliercolor has 2 members on this sheet

                  Members: False; True

                  The formula is [Outlierlower] OR [Outlierupper]

                  Results are computed along Sublabel for each Login Id.
                  For Outlierlower, results are computed along Sublabel for each Login Id.
                  For Outlierupper, results are computed along Sublabel for each Login Id.
                  For Rankedavgerrperseqid, results are computed along Login Id for each Sublabel.
                  For Maxrank, results are computed along Sublabel for each Login Id.

                  Rankedavgerrperseqid ranges from 1 to 56 on this sheet.

                  The formula is RANK([Avgerrperjobseq])

                  Results are computed along Login Id for each Sublabel.

                  Avgerrperjobseq ranges from 1.000 to 6.214 on this sheet.

                  The formula is SUM([Number of Records])/COUNTD([Job Seq Id])

                  Maxrank ranges from 34 to 56 on this sheet.

                  The formula is WINDOW_MAX([Rankedavgerrperseqid])

                  Results are computed along Sublabel for each Login Id.
                  For Rankedavgerrperseqid, results are computed along Login Id for each Sublabel.

                  Data Source Details

                   

                  Data Source:

                  advlabel+ (sa_contents)

                  Type:

                  MySQL Database

                   

                  Table:

                  advlabel2, firstlast, newparsed, usergrades

                   

                   

                  • 6. Re: How to get the top k tail outliers in a box plot ?
                    Welton Barbosa

                    Measure Names does not recognise Measure with String Types. All measures must be assigned as Numerical to be recognised in Measure Names

                    • 7. Re: How to get the top k tail outliers in a box plot ?
                      swaroop.gantela

                      Aliabbas,

                       

                      Thank you for the screenshots.

                      Please see if the attached is more closely aligned with your set.

                      I did my best to mock up a dataset, please adjust it as needed to match yours.

                       

                      I think on all sheets, it will mostly be an issue of you having to adjust your "Compute using"

                      to get it work correctly. With the different sheets in the attached, they didn't all

                      have the same "Compute using"s.

                       

                      In the attached:

                      [Avgerrperjobseq] is

                      [Number of Records]/COUNTD([Job Seq ID])

                       

                      [RankAvg] is

                      RANK_UNIQUE([Avgerrperjobseq])

                      (I used Unique for this mock dataset, but you can probably just use RANK)

                       

                      [Outliers] is

                      [RankAvg]>=[MaxRank]-[SelectK]+1

                      OR

                      [RankAvg]<=[SelectK]

                       

                      The Table doesn't look yet like how you want it (a string of the IDs).

                      But see if the attached could be a first step:

                      I basically put all the loginids on in a row and filtered it

                      to keep just the outliers and color them as upper and lower.

                       

                      Please take note of the "Compute using"s on the different worksheets.

                      On the graph, it is just "Label".

                      On the table, I needed to use the Advanced option and Restart every Sublabel.

                       

                      211935outlierB.png

                      1 of 1 people found this helpful
                      • 8. Re: How to get the top k tail outliers in a box plot ?
                        ALIABBAS PETIWALA

                        Thanks  swaroop.gantela I was able to get what I wanted. Also I wanted to sort the loginid based on ranks per sublabel , how is that posssible?

                        • 9. Re: How to get the top k tail outliers in a box plot ?
                          swaroop.gantela

                          Aliabbas,

                           

                          Glad you were able to get what you wanted.

                          To sort the loginid, I think you can try putting the Rank

                          on the Columns shelf and not show it's header.

                          You can then sort it, but It looks like the only option is to sort manually,

                          but I think once you do that for all the possible Top K  possibilities, it should stick.

                           

                          211935outlierC.png

                          • 10. Re: How to get the top k tail outliers in a box plot ?
                            swaroop.gantela

                            Ashish,

                             

                            Thank you for the kind words.

                            Moreso, thank you for all that you're doing for the community.

                             

                            I don't write anything, I'm just trying to help out if I can.

                            I enjoy the challenges, so thank you Aliabbas for posting the question.

                            Aliabbas, I quite like your visualziation, and I look forward to your forthcoming contributions.

                            • 11. Re: How to get the top k tail outliers in a box plot ?
                              ALIABBAS PETIWALA

                              Thanks , it is not feasible to sort manually

                              • 12. Re: How to get the top k tail outliers in a box plot ?
                                swaroop.gantela

                                Aliabbas,

                                 

                                Sorry that I didn't clarify, it would be more difficult if you are wanting to show the top 20.

                                But for less than that it could be feasible.

                                Will look into other options.

                                 

                                [Edit]

                                Maybe the Sort wasn't necessary.

                                If the Rank again is set with the correct "Compute using",

                                I think it may sort it correctly by itself.