3 Replies Latest reply on May 30, 2012 4:52 PM by Jonathan Drummey

    How do you calculate the standard deviation of a product category?

    Fabio Annovazzi



      I would like to calculate the volatility index of sales (measured as standard variation/average).


      I have a series of products, which belong to different categories.


      Everything works fine if I just want to calculate the standard deviation/average of product sales.


      If instead I want to calculate the average volatility of sales at the category level nothing works,


      This metric should be the average of the single volatility indexes of the productc of each category. Instead all I manage to do is pull our the stardard deviation/average off all the products of each category, which is meaningless.


      If for instance in category 1 there are two products, that sell the same qauntity every day, but product A always sells one million units, and product B always sells 10 units, the varaince of that category should be zero, instead it comes out positive because of the difference between the sales of product A and product B.


      Any way of calculating the variances at the product level and the averaging them up at the category level?


      Thansk so much



        • 1. Re: How do you calculate the standard deviation of a product category?
          Jonathan Drummey

          Hi Fabio,


          An exact answer is going to depend on what dimensions you have in the view you are constructing, I've set up a basic example using the superstore sales data. To calculate the variance at the product level, you are going to need the Product in the view. A simple STDEV(Sales) should get that for you. Then, to create an aggregation of that, we turn to table calculations. So, a table calculation such as:


          IF FIRST()==0 THEN

              WINDOW_AVG(STDEV([Sales]), 0, IIF(FIRST()=0,LAST(),0))



          with the Compute Using set to Product should get you the results you are looking for. The IF/IIF statements in the above calc are there to improve performance and eliminate overlapping text using the method described by Richard Leeke:






          • 2. Re: How do you calculate the standard deviation of a product category?
            Fabio Annovazzi

            Hi Jonathan,


            thanks a lot, this worked fine. And it is also very quick.


            One more question.


            I tried to apply Richard's trick to another workbook that takes 5 minutes to load, and has 4 table calculations with window_avg.


            It looks for the fan pages that have had the largest increase in number of new fan (a so might have done something smart) by comparing ration of average number of new likes in the last seven days/ average number of new fans in the last month.


            My original formula was


            WINDOW_AVG(sum([Likes per week]),-1,0)




            WINDOW_AVG(sum([Likes per week]),-30,0).


            In order to make them compute only once I changed them into


            IIf([last record],WINDOW_AVG(sum([Likes per week]),-1, IIF(FIRST()==-1,0,-1)),0)




            IIf([last record],WINDOW_AVG(sum([Likes per week]),-30, IIF(FIRST()==-30,-1,-30)),0),


            which I thought would make it calculate only starting from the row of the most recent date.


            There was no improvement of performance. Any idea why?





            • 3. Re: How do you calculate the standard deviation of a product category?
              Jonathan Drummey

              Hi Fabio,


              I'm going to ping Richard Leeke on this one, from looking at it I'm not sure, I do have a couple of ideas:


              First of all, reducing the amount of data that Tableau has to grab from the DB can have a big performance improvement, so if you haven't set up a filter to grab the minimal number of records you need (maybe the last month?), try that first.


              You don't say what the [last record] value is, depending on that calculation it may be acting on all rows and slowing things down. Also, if [last record] is returning TRUE for more than just the last record, then the calc will of course be running more often. That's something to test out, I have a suggestion on a method at the bottom of this post.


              Something I read in a Tableau performance doc said that IF was faster than IIF, and in this case where we only want one result then we definitely don't want to return 0 for every non-calculated row, leaving it as NULL would work better. So, something that might work better would be:


              IF LAST()==0 THEN

                   WINDOW_AVG(SUM([Likes per week]),-1,IIF(LAST()==0,0,-1))



              A way to test this is to grab a smaller batch of data and set up a text table where all your discrete dimensions are on rows, with Measure Names on Columns and Measure Values on text. For an example, see the attached. This way you can validate each component of the table calc to make sure it's returning the right values for the right row(s) in your data. You might try this on your data set to see what calcs work the fastest. And hopefully Richard will respond as well!