3 Replies Latest reply on Dec 28, 2016 7:43 AM by Jonathan Drummey

    Volumes with Missing Days calculates wrong average

    Helene Matassa

      Hi -

       

      I'm trying to calculate a daily average case volume for the year or month and am getting the wrong answer because cases don't happen on every day of the year.  The case volume for 2016 is 3,383 and there were cases on 250 days.  On an annual basis the average calculated is 13.53.  The correct number of days in 2016 is 366 days so the correct average is 9.24.  I tried the LOD { FIXED [Case Date] : AVG([Volume]) } but that is wrong since it uses the Case Date.  Are there any ideas as to how to fill in for the missing days?  I am using version 10.1.1

       

      Thanks for any help!

        • 1. Re: Volumes with Missing Days calculates wrong average
          Jonathan Drummey

          Hi,

           

          My suggestion is rather than try to pad out the data to just do a calculation that gets the number of days in the year and then divide by that, here are two calcs:

           

          Days in Year

          DATEPART('dayofyear',MAKEDATE(YEAR([Case Date]),12,31))

           

          Avg jtd

          //requires year in view to be accurate

          SUM([Volume])/MIN([Days in Year])

           

          And a view:

           

          Screen Shot 2016-12-19 at 9.51.33 AM.png

           

          v10.1 workbook is attached, let me know if you have any questions!

           

          Jonathan

          1 of 1 people found this helpful
          • 2. Re: Volumes with Missing Days calculates wrong average
            Helene Matassa

            Thank you for replying so quickly! 

             

            Your solution works great when looking at average daily volume for a year.  I need to be able to drill down further, say to a monthly level.  So, in the attached data set the total volume in January is 270.  The correct average in this case is 270/31 or 8.71 cases per day.  The average that is calculated without the missing days 12.86 cases per day.    In truth, I need to get as granular as hourly but didn't want to over complicate the workbook.  I thought if it worked at an annual and monthly level - using date/time drill downs, it would naturally follow that I can drill down to days and hours.  For example, if there were a total of X cases on Tuesdays during the year or over several years and there were Y Tuesdays in that same time frame, the average would be X/Y and further if there were A number of cases at hour 9 on Tuesday's in a year and Y Tuesdays (9 AM) then the average would be A/Y. 

             

            I truly apologize if I wasn't clear in my original question.

            • 3. Re: Volumes with Missing Days calculates wrong average
              Jonathan Drummey

              Hi Helene,

               

              Thanks for the apology, no worries! In the future when you are asking a question here on the forums or getting tech support please do give the fuller picture of your ultimate goal, the reason why I ask is that because of  idiosyncrasies in how Tableau works what answers one question might not help the ultimate goal, and I’m thinking this is one of those situations. I’m making my best-guess as to what you are looking for. This (plus the Christmas holiday) made my response take a little longer to put together.

               

              When you write "drill down" I'm guessing that you have wanted to use Tableau's built-in date hierarchy to drill down? In that case then what we'd need is to:

               

              a) recognize the date level the user has drilled to

              b) sum up the number of cases at that date level

              c) get the appropriate number of days/hours for that date level

              d) generate the results

               

              Unfortunately there's a big problem with a) in that Tableau does not provide us with a way for calculated fields to recognize the current hierarchy level. I've written calculations to do this in the past but they are "fragile" in that they only reliably work when the data *isn’t* sparse, and this data set *is* sparse by definition. There’s also one more problem with Tableau’s built-in drill for dates and that is around weekdays. Tableau’s built-in hierarchy does Year/
Quarter/Month/Day of Month, it doesn’t have a notion of weekdays.

               

              So the workaround for these two issues is to *not* use Tableau's built-in date hierarchy and instead use a parameter to control the date level. In a *sparse* data source (one that doesn’t have every day) this requires the following:

               

              1. Setting up the parameter for the different date levels.

              2. Creating a Periods in Level calculation. This uses date math to get the  # of days in month, # of days in year, etc. It has a bonus complication (i.e. another calculation called # of Weeks for Weekday) when we’re dealing with weekdays because the number of weeks in the year for a given weekday changes each year.

              3. Creating a DateLevel Dimension calculated field that will return the correct date level based on the parameter and Case Date. In the case of the weekday level the dates are normalized to dates in 1900.

              4. Creating a Date for Display calculated field that converts the DateLevel Dimension to a string for display based on the parameter (since we’re losing Tableau’s built-in date hierarchy that automatically changes the date format as we change the hierarchy).

              5. Creating an Avg Label field based on the parameter to note what is being displayed.

              6. The Average Volume field has the formula SUM([Volume])/MIN([Periods in Level]).

               

              You can see all the calcs in the Sparse Workout worksheet, here’s the Sparse Average Volumes worksheet with the Month level selected:

               

              Screen Shot 2016-12-28 at 9.02.08 AM.png

               

               

              Now there’s one problem with this due the sparse data that is visible if I go to the Sparse Bars view and select the Day level:

              Screen Shot 2016-12-28 at 9.01.34 AM.png

               

              The missing days (Saturdays and Sundays) really should be in the view but they aren’t. While there can be a workaround using Tableau’s built-in data densification I find that it’s got too many limitations for more general use, especially since Tableau 10.0 introduced cross data source joins. What cross data source joins let us do is easily pad out data sources by starting with our original (sparse) data source and then doing an outer join to a  complete source of dates. In this case that took about a minute to generate an Excel file with all days from 1/1/2016 to 1/1/2024. In Tableau v10.0 we could only do left joins, Tableau v10.1 adds right and full outer joins. Since this workbook is on v10.1 I’m using the original (sparse) data on the left and the domain complete Excel dates on the right in this data source:

              Screen Shot 2016-12-28 at 9.03.01 AM.png

               

              Now we have a data set with a row for every date from 1/1/2016 to 1/1/2024. The calculations in this padded source are similar to the ones used in the sparse data, but slightly simpler:

               

              1. Setting up the parameter for the different date levels.

              2. Creating a Periods in Level calculation. Because the data is domain-complete then we can use a measure to count the dates, in this case COUNTD([Date]) except when the Date Level is day, in which case it’s 24 (hours).

              3. Creating a DateLevel Dimension calculated field that will return the correct date level based on the parameter and Case Date. In the case of the weekday level the dates are normalized to dates in 1900. This is based on the padded date and not the sparse Case Date.

              4. [unchanged from sparse version] Creating a Date for Display calculated field that converts the DateLevel Dimension to a string for display based on the parameter (since we’re losing Tableau’s built-in date hierarchy that automatically changes the date format as we change the hierarchy).

              5. [unchanged from sparse version] Creating an Avg Label field based on the parameter to note what is being displayed.

              6. The Average Volume field has the formula SUM([Volume])/[Periods in Level].

               

              Now the Padded Bars sheet is showing all the dates:

               

              Screen Shot 2016-12-28 at 9.03.46 AM.png

               

              This kind of padding does have a problem with filtering if you have multiple surgical units or hospitals and want to filter for individual surgical units/hospitals, to pad out in that case then we’d want to be adding a record for each day & surgical unit/hospital combination. Currently that requires more work in the padded source via manual effort or custom SQL. Tableau v10.2 (now in beta) will make that easier with join-on-calculations, I’ve been wanting to do a demo of it but there’s a bug in v10.2 Beta 1 that makes that not work, once Beta 2 comes out with a fix (I hope) then I'll do a demo of it.

               

              I set up the sparse and padded cases in the attached workbook, let me know if you have any questions!

               

              Jonathan