1 Reply Latest reply on Dec 10, 2012 10:55 AM by Jonathan Drummey

    Moving/running average with missing rows from database source

    Damian Jordan

      Posting this solution since it is not well documented and I was unable to find the answer in forums.

      Problem: Database query returning following rows:

      Date          Value

      1-Jan-2012     100

      3-Jan-2012     50

      5-Jan-2012     100

      7-Jan-2012     50


      Note that 2 Jan and 4 Jan are missing rows.

      Create a line viz showing the data and add a duplicated moving average table calc. Turn on show missing values. Ensure you are using a continuous line using day.

      Sheet 1.jpg

      Note that the moving average is incorrect since Tableau is only using the values in the data series even though you told it to show missing values.

      Even though Tableau has "padded the domain" the loving average calc ignores the padded values since they are effectively null.

      A look at the data shows the problem. The moving average values should be far less than the values shown.

      Sheet 2.jpg

      The solution is simply to find a way to reference the "padded" values and convert the nulls to zero.

      This can be achieved using LOOKUP.

      If you edit the table caluulation you will see:

      WINDOW_AVG(SUM([Value (copy)]), -2, 0)


      Change this to


      if  isnull(lookup(sum([Value]),0)) then 0  else lookup(sum([Value]),0) end


      The result now computes the moving average including the "padded" values.

      Sheet 3.jpg

      Use this new calculated field in your chart.


      Anyone know of a better solution?

        • 1. Re: Moving/running average with missing rows from database source
          Jonathan Drummey

          Alternatives within the data include building what Joe Mako calls a "scaffold" data source that would include all days and then using that as the primary source and blending the secondary to it, or doing the padding in SQL.


          When using the Show Missing Values, an alternative to using the WINDOW_AVG function is to use a WINDOW_SUM() and a divisor, such as this:


          WINDOW_SUM(SUM([Value]),-2,0)/IF INDEX()<3 THEN INDEX() ELSE 3 END


          This calc sums over the 3 values, which will include any padding. The divisor uses the INDEX() function to deal with the first two rows of the partition.


          See the attached for an example.