5 Replies Latest reply on Jan 21, 2014 11:46 AM by jordan.bunker

    Normalizing Data based on Shipping Days for a given month

    David Biehl

      Before we started using Tableau, we would always normalize sales revenue for each month based on the number of shipping days in the month in our existing reports. For example, January 2014 has 31 calendar days, but only 22 shipping days. February 2014 has only 20 shipping days. A comparison of straight revenue figures for January and February is not entirely accurate because January had two additional shipping days.

       

      We also refer to this as the Average Daily Sales Revenue, because the best way we know how to normalize this data is to take the sales for the month and divide it by shipping days to get a better approximation of what happened on a daily average for the month.

       

      We keep all of the shipping days data in a table in our database. It basically has the month and year with the number of shipping days for that period. Other reports use this data to normalize monthly aggregations of sales revenue.

       

      I'm not quite sure how to accomplish this with Tableau. I know that I can pre-aggregate the data using a view in the data source, but I'd really rather avoid that unless I absolutely have to. If anyone can help with this, I would really appreciate it!

        • 1. Re: Normalizing Data based on Shipping Days for a given month

          Hi David,

           

          I suspect that it should be possible to create a calculation that aggregates data in the desired manner, but the specific calculation(s) necessary would depend on the structure of the data in the workbook. Are you able to post the packaged workbook or a sample workbook?

           

          - Jordan

          • 2. Re: Re: Normalizing Data based on Shipping Days for a given month
            David Biehl

            Here is a packaged workbook with sample data as it would come from our database. In our database, each row represents a sale ... and there can be more than 1 sale on each day. We already have a table that contains the number of business days in each month. In this example, simulated a join to that table so each row knows how many business days are in the given months of the sale.

             

            I can do a simple calculation for [Revenue] / [Business Days], but that only works when aggregating my month to get an average daily sales. If I aggregate by quarter or year, it's off because the quarter simply sums the value of the three months it contains. It doesn't sum the sales for the quarter and then divide by the total number of business days in the quarter. I understand that we probably won't be able to aggregate by anything smaller than a month with this calculation, and we are OK with that.

             

            Maybe we're going about this the wrong way ... but hopefully this sample gives you an idea of what we're working with and what we're trying to achieve.

            • 3. Re: Re: Re: Normalizing Data based on Shipping Days for a given month

              Hi David,

               

              In general, if a calculation requires using a field that contains pre-aggregated data at the month level, then the month level will be the lowest level of granularity that is possible to drill down to.

               

              With regard to the "revenue per number of shipping days" calculation, at the monthly level, we can just use SUM(Revenue)/ATTR(Shipping Days) to get the ammount of revenue per shipping day. At the quarterly or yearly level, since the Shipping Days field is pre-aggregated at the monthly level, we'll need to keep the MONTH(Date) field in the view in order to avoid any automatic aggregation at the quarter or yearly level, and then create a table calculation to explicitly calculate the number of Shipping Days present in the view:

               

                CASE [Date Level]

                WHEN "Monthly" THEN ATTR([Shipping Days])

                ELSE WINDOW_SUM(ATTR([Shipping Days]))

                END

               

              Since different date levels will require calculations with different aggregations, we can create a parameter for selecting the date level, and then use calculated fields containing CASE statements to return the correct aggregation based on the date level chosen. For example:

               

                CASE [Date Level]

                WHEN "Monthly" THEN SUM([Revenue])

                WHEN "Quarterly" THEN WINDOW_SUM(SUM([Revenue]))

                WHEN "Yearly" THEN WINDOW_SUM(SUM([Revenue]))

                END

               

              From these two calculations, the revenue per number of shipping days can be calulated by simply dividing.

               

              Since we are selecting the date level from a parameter, it is also necessary to create a calculated field to return the date level names for the data in the view:

               

              CASE [Date Level]

              WHEN "Monthly" THEN DATENAME('month',[Date])

              WHEN "Quarterly" THEN "Q"+DATENAME('quarter',[Date])

              WHEN "Yearly" THEN DATENAME('year',[Date])

              END

               

              Finally, since we don't want the month level data being shown for the quarterly or yearly data levels, we use an INDEX table calculation to return only the first record, unless the "Monthly" level is selected:

               

                INDEX() == 1 OR [Date Level] = "Monthly"

               

              (Note: All table calculations within the workbook are set the compute using the "Date" field.)

               

              For details on the above method, please see the attached packaged workbook. In order to double check that the method worked across multiple quarters and years, I extended the provided Excel data, which you will find as a second data connection in the workbook.

               

              While this works for this case, it does limit some of the features that would otherwise be available in Tableau Desktop. What might be more ideal would be to have a record generated in the database every day, and then an additional boolean "Shipping Day" column which denotes whether the day was a shipping day or not. This way, we could use the COUNT aggregation to get the correct number of shipping days regardless of the date level in the view, and table calculations should not be necessary.

               

              - Jordan

              • 4. Re: Normalizing Data based on Shipping Days for a given month
                David Biehl
                While this works for this case, it does limit some of the features that would otherwise be available in Tableau Desktop. What might be more ideal would be to have a record generated in the database every day, and then an additional boolean "Shipping Day" column which denotes whether the day was a shipping day or not. This way, we could use the COUNT aggregation to get the correct number of shipping days regardless of the date level in the view, and table calculations should not be necessary.

                 

                Interesting thought ... since the dates that appear in the data source are all shipping days ... I wonder if a DISTINCT COUNT of MDY would work? My only concern would be if anything was filtered and a shipping day was missed, I suppose. Is that a viable concern?

                • 5. Re: Normalizing Data based on Shipping Days for a given month

                  As long as there is a record for every shipping day in the month, the COUNTD of the date should work! You are correct, though, that if the data is being filtered, it may cause some dates to be omitted, which would change the COUNTD.