5 Replies Latest reply on Jun 14, 2016 9:15 AM by Steven Myers

    Creating a set based on formula for excluding marks

    Steven Myers

      I'm trying to apply an in/out sets filter in order to examine only "typical" transactions. I have created a set (see attached copy of Superstore) called "Sales Outliers" and then put that set on the filter shelf.

       

      The purpose of the calculation is to determine what the aggregated sales are for a region after excluding all sales where the value of the sale is more than 2.5 times the population standard deviation.

       

      In order to validate the calculations, I dropped the data into the attached Excel spreadsheet for the West region. As you can see, when the filter is set to "out" only, I should be seeing total sales of $495,782 but instead see $166,969.

       

      What am I doing wrong?

        • 1. Re: Creating a set based on formula for excluding marks
          Chris Cantrell

          Hello Steven,

          I have a little explanation for you, and then a solution that is roughly equivalent, but doesn't use the same set function.

          The set in your workbook determines whether the value belongs in or out of the set before the viz is generated. What you end up with is a standard deviation which is for all sales in all regions. Then when your viz is built and you select In/Out, the Outset is heavily influenced by the other regions which gives you the incorrect result. What you need is a way to isolate the regions to build the standard deviations first, and then the "In/Out" should work for you.

          I haven't played with sets much myself, so I am not sure how to make a functional condition formula. What worked with a calculated field isn't a direct translation to the set condition field.

          The calculated field I built breaks the regions apart and calculates the standard deviation for each region individually, then applies an "In/Out" status to each record in the data. Then you are able to use the calculated field in the filter instead of the set.

          IF [Sales]>2.5*{FIXED [Region]:STDEVP([Sales])} THEN 'In' ELSE 'Out' END

          Your Excel spreadsheet allowed you to isolate the West Region. This is basically the same function in Tableau, but for all regions at the same time.

          You should be able to copy paste the above formula into Tableau and see how the function changes.

          Hope this helps,

          Chris

          • 2. Re: Creating a set based on formula for excluding marks
            Steven Myers

            Thanks for your reply Chris. I can see how your LOD expression would work for the calculated field so I'm going to try and apply the same logic to the set calculation as that it would seem to me to be the more elegant solution. I'll let you know what happens.

            • 3. Re: Creating a set based on formula for excluding marks
              Steven Myers

              So I tried

               

              SUM([Sales]) > (2.5 * {FIXED [Region]:STDEV([Sales])})

               

              as the condition formula but got the handy old "Cannot mix aggregate and non-aggregate arguments with this function."

               

              By changing to

               

              [Sales] > (2.5 * {FIXED [Region]:STDEV([Sales])})

               

              I got "The formula must be an aggregate calculation or refer only to this field."

               

              I'm pretty sure this is heading in the right direction but just can seem to get this last little piece.

              • 4. Re: Creating a set based on formula for excluding marks
                Chris Cantrell

                Hi Steven,

                Sorry I am a day late here.

                SUM([Sales]) > AVG((2.5 * {FIXED [Region]:STDEVP([Sales])}))

                That solves the aggregation level of the argument... but throws off the numbers badly, which I would expect.

                And a snip back to basic algebra... easier to aggregate that way...

                0 > ((2.5*{FIXED [Region]:STDEVP([Sales])}) - ([Sales]))

                This got closer I think. Inside, we get four operations, each one has the individual sales subtracted from the standard deviation. Unfortunately... it ends up summing all of these to determine if the record is "in". Drop the SUM function and wants an aggregate. Any of those would skew the result.

                Just based on what I have experienced so far, I don't currently believe a set is capable of being manipulated to do exactly what you are asking, at least not without creating calculated fields.

                 

                Maybe someone with more set knowledge is available.

                Sorry I couldn't be more assistance.

                 

                Chris

                • 5. Re: Creating a set based on formula for excluding marks
                  Steven Myers

                  Thank you Chris Cantrell and Anthony from customer support. You pointed me in the right direction so what follows is the solution to the problem.

                   

                  There are two pieces to making this work:

                  1) The set needs to have a unique key if you are trying to build a set that excludes specific records. Since Order ID is not unique at a record level, this had to be changed.

                  2) Both sides of the set comparison condition have to be set with LOD expressions.

                   

                  I've attached the solution. Note that it was necessary to add a record ID field to be absolutely certain that all records could be uniquely identified.