13 Replies Latest reply on Sep 4, 2013 11:22 AM by Jonathan Drummey

    Min/Avg/Med/Max/CoV in a single table with different date rollups

    Jason Scarlett

      Looking for some general ideas on how to approach this issue.

       

      Part A

      Is it possible to count the min/avg/max/etc number of records in a table based on a date field to get a single (or few) "text table" with the following structure?

       

      Date RollupMin CntAvg CntMedian CntMax CntCoeff of Var
      Hourly210.111151.2
      Daily200265269289
      Weekly1600...
      Monthly
      Yearly

       

      I suspect it would be possible using custom SQL to aggregate the data before it is loaded into Tableau ... but wondering if some sort of nested table calculation might be possible. (pros/cons of each approach, anything new in v8?)

       

      Part B

      Similar format but the user would be able to select the "rollup level" ... i.e. the user could choose "daily" and all the values would be done on a daily basis. If "weekly" was chosen the calculation would return the extrapolated value (i.e. yesterday x 7)

      Date RollupMin CntAvg CntMedian CntMax CntCoeff of Var
      Yesterday
      Last Week
      Last Month
      Last 12 Months

       

      I think I can do part B in v8 much easier now with the improved filters if Part A can be done.

        • 1. Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
          Jonathan Drummey

          Hi Jason,

           

          I suspect that part of why your question didn't get answered is that (besides being complicated) you didn't post any sample data, see So Your Question Didn't Get Answered... for details.

           

          Assuming that your data is a simple list of dates and you want to count the number of records, you have a set of measures in the columns. Tableau needs something (i.e. a dimension) to use to aggregate the measures over, so that means the date rollups need to be dimensions, but I'm not clear on exactly how an hourly/daily/weekly min/max count would work. Do you mean the minimum COUNT(Number of Records) for any hour in the data, in the day in the data, etc.? In that case, then you'd be need to be using something like WINDOW_MIN(COUNT(Number of Records)) with hour in the view to get the Min Count per hour, the same calc with Day in the view to get the Min Count per day, etc. You could generate a ton of measures for each combination of measure & date rollup, but that still doesn't get you the dimension that you'd need for layout, so one option would be to have 5 worksheets, one for each rollup level. That could get really slow, because Tableau would have to issue 5*N queries to the data source (where N is at least 1 and can be bigger depending on the filters involved).

           

          There's a big scaffold or small scaffold solution that I can think of that would work better:

          • The big scaffold would be UNION'ed data source where I'd have the dates & number of records aggregated 5 times, once for hourly, daily, etc. and a new "Date Rollup" dimension. Then you could have the aggregated date on the level of detail, the Date Rollup on Rows, and a calculation like IF FIRST()==0 THEN WINDOW_MIN(COUNT(Number of Records)) END with Compute Using on the aggregated date would get you the Minimum Count for each date rollup. Similar calcs would work for the other measures.

           

          • The smaller scaffold would be a cross product of the 5 date rollups and every date at the finest level of granularity that you need, then you'd have 5 connections to your data source (one for each rollup), and use a combination of calculated fields and data blending to get you the results.

           

          Hope this helps!

           

          Jonathan

          • 2. Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
            Jason Scarlett

            Thanks.

             

            I've attached my attempt using the Tableau Superstore data to show the issue I run into and how I used your suggestion for an adequate solution. Now that v8 has worksheet level filters I will be able to make this work in a dashboard once our server is upgraded from 7 to 8.

             

            I still get confused with window calcs. The trick I was missing was the FIRST() = 0 bit. I always forget that an individual calculation can "filter" data out without removing the rest of the data from the view.

             

            As for the scaffold methods, I suspected that there may be some data wrangling needed prior to Tableau but was trying to avoid it. Is there a way to use the data calculated in Tableau as a Data Source for Tableau? I could generate the results in the format you mentioned, and then feed it back to Tableau to do the min/max on. Maybe in version 9

             

            Jason

            • 3. Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
              Jonathan Drummey

              The problem here is that you're trying to do an aggregate (the window min/max/avg) of an aggregate (the sum of number of records) at several different levels of detail, it's really asking Tableau (or any BI tool) to do a lot in a single view.

               

              Though you can nest table calculations, Tableau does not let you use a worksheet as a data source (there's an Idea for that). A workaround is to manually copy and paste data back in, or export data to an Access .mdb and bring that back into Tableau.

               

              I looked at your calc and I don't think it's giving accurate results. You're doing a datediff of the min and max order date per month, which is going to give a number that is 1 less than the actual number of days involved (datediff counts the difference, not the days, see http://betterexplained.com/articles/learning-how-to-count-avoiding-the-fencepost-problem/ for a description). That is not necessarily the number of days there were actually orders in the month, nor is it the number of days actually in the month. There are solutions for both, which of those are you trying to do?

               

              Jonathan

              1 of 1 people found this helpful
              • 4. Re: Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
                Jason Scarlett

                The number of days in the month, is what I am after.

                 

                There are four complications that I can see beside the fencepost issue you caught:

                 

                1. Dates with no orders in the middle of date window. A distinct count on order date would highlight this (maybe I would call this the kicked in fenceboard problem ).
                2. Dates missing at end of a complete month -- my original min/max approach (with a +1 fencepost fix) would fail here because of the max failing to reach the last day in the month.
                3. Dates missing at end of an incomplete month (current month) -- my original min/max approach (with a +1 fencepost fix) would work here.
                4. Different counts across multiple years - Feb 2012 and Feb 2013 have 29/28 days respectively. If the year is not shown how would Tableau know to divide by 28 in one case and 29 is the other? My original solution would use 29 days for Feb in both years (wrong).

                 

                As you can see in the attached sheet, I've added some various ways to calculate the number of days in the month. None of them seem to work properly. I suspect using the "datetrunc" function with the a "dateadd" funciton with a condition liek "current period" would get close, but still doesn't solve #4.

                 

                Jason

                • 5. Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
                  Jonathan Drummey

                  I’m confused w/r/t complication #1: do you want dates with no orders to be counted for the average or not? Either way, the Min/Max solution won’t work because of the complications you mentioned, plus the additional complication that it will miss days in any month where the first order comes in after the first of the month.

                   

                  If you want to always count every day in the month, these two calculated fields would work, and they work fine for 2012 and other leap years:

                   

                  Days in Month (make this into a Dimension):

                  DAY(DATEADD(‘month’,1,DATETRUNC(‘month’,[Order Date]))-1)

                   

                  Adjusted Days in Month:

                  IF DATETRUNC(‘month’,[Order Date]) == DATETRUNC(‘month’,TODAY()) THEN

                                  DAY(TODAY())

                  ELSE

                                   

                  END

                   

                  I set this up in the attached.

                   

                  If you want to only count the days (or hours, or weeks) that there were orders, then it gets a wee bit more complicated because we have to know what the level of detail will be in the view (in other words, what are the dimension pills and their granularity). For example, looking at the workbook you’d posted, the reason why COUNTD(DATETRUNC(‘day’,[Order Date])) is returning 1 is because DAY(Order Date) is a dimension in the view, in other words, there really is only one day for each order date. If you take DAY(Order Date) out of the view, you’ll see that the COUNTD() works. However, I’m not clear on whether your final view would need the DAY(Order Date) in it in order to actually work. If you did need DAY(Order Date), then I’d use TOTAL(COUNTD(DATETRUNC(‘day’,[Order Date]))) with an Advanced Compute Using on just the Day of Order Date, that would provide an accurate count that you could use in other calcs.

                   

                  Jonathan

                  1 of 1 people found this helpful
                  • 6. Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
                    Jason Scarlett

                    #1 was a red herring.

                     

                    I have fought with this one all day and can't seem to get the AVG and MAX/MIN to work on the same worksheet.

                    The Min/Max are easy enough since I don't need to know the number of days in a month to find the min/max daily count.

                     

                    if first() = 0 then window_max(count([Number of Records])) end

                    // use compute by ORDER_DT and add level of detail as YEAR,MONTH,DAY

                     

                    The AVG however I have issues with. You'll see my attempts on the last worksheet attached.

                     

                    Jason

                     

                    ps. I tried to clean out all the other rubbish I tried.

                    • 7. Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
                      Jason Scarlett

                      Attached seems to work but the Level of Detail pill is used for the rollup level. It seems to be a simpler solution (for my brain anyways) than nesting table calculations.

                       

                      It does not work for a partial rollup level (i.e. if I want a monthly average and the current partial month is included it will weigh the current month the same as the others). I can deal with this case by excluding the current period.

                       

                      Thanks

                      Jason

                      • 8. Re: Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
                        Jonathan Drummey

                        Hi Jason, I'm pretty sure why you are seeing different results for the MIN(Adjusted Days in Month) calc is that the MY(Order Date) filter has September 2012 and September 2013, so the current partial month is (as of 3 Sep 2013) showing 30 for Sep 2012 and 3 for Sep 2013.

                         

                        I think you can get what you want in Tableau, however I'm still not clear on whether you want dates with no orders to be counted for the average or not?

                        Jonathan

                        • 9. Re: Re: Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
                          Jason Scarlett

                          Dates in the past with no order should be considered as a zero.

                          Dates in the future with or without an order should be removed (as per previous solution using adjusted days in months).

                           

                          I've filtered out 2012 Sep 05 to demonstrate the missing order date.

                          Count: 2012 Sep 01-30  (minus Sept 05) = 316

                          Count: 2012 Sep 01-03 (to current date) = 57

                          Daily average = 11.30 (= 373/33, not 373/32=11.66)

                           

                          You're right about the partial month (Sept 1st-3rd).

                           

                          ... After a long discussion with a colleague we are inclined to think that month 'counts' are less than ideal because of the differing days in a month (applies to quarters too). We instead are going to go with a daily average that is multiplied by 30.4 (=365/12) to remove any bias. I know one of my pet peeves are the monthly control charts used in my organization that don't correct for this very issue.

                           

                          Back to the missing dates ... I can't seem to figure out how to mix the window calculation for the numerator with the denominator. I may be over-complicating things?

                           

                          Jason

                          • 10. Re: Re: Re: Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
                            Jonathan Drummey

                            See the attached. What I did was use the Adjusted # of Days calc that I'd previously created that counts the number of days in each month. Since you'd said you want to count the days that don't exist in a month as 0 (and based on the calculations you put in your last post), I presumed that meant you'd want to be including those in the denominator.

                             

                            Because the Adjusted # of Days calculation computes a result for every day, I set up a table calculation to only compute a result for the first day in each month. That's the Total # of Days for Month calc, and it has an advanced Compute Using of MDY(Order Date) so it partitions on each Month. That is nested within the Avg # of Orders calc, which sums up all the orders and the Total # of Days for Month calc, and the result is the 11.30.

                             

                            Is this what you are looking for?

                             

                            Jonathan

                            • 11. Re: Re: Re: Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
                              Jonathan Drummey

                              One more thought, since it's the day after Labor Day and Labor Day was on the 2nd of September this year…if the beginning of the month has *no* orders, then you won't see any rows for September in the view. If that is a situation that could happen in your data, and you need to show 0, you'd have to do some padding of your data.

                               

                              Jonathan

                              • 12. Re: Re: Re: Re: Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
                                Jason Scarlett

                                This does what I want, but I have two questions:

                                1. For the Total # of Days for Month calc: [IF FIRST()==0 THEN MIN([Adjusted Days in Month]) END], you have set the partition by to year/month only. I understand this is so the FIRST() only returns data on the first day of the partition which happens to be the first of the month. So far so good. But then in the Avg # of Orders calc the partition appears to be on both the year/month and the year/month/day. How does Tableau know to use only year/month for the "WINDOW_SUM([Total # of Days for Month])" denominator part (which has its own FIRST()) and to use both partitions for the FIRST()" part of the main calc? Are there two different partitions being applied?
                                2. When I move the date pills to the details shelf I still get the right  results. But, if I change the Order By on the Avg # of Orders calc to "Order Date" it gives the wrong answer. If I then manually change the calculate using back to what was previously there (edit_table calc >> compute using >> advanced >> month/day/year AND month/year on the partition) I get the wrong results. No matter how I change it now, I get the wrong results.I have to delete it and re-add it to the worksheet to get it to work ... which makes me think the true partitioning levels (as above) are not bieng shown.


                                Null start of the month orders is an acceptable 'exception'.


                                Jason

                                • 13. Re: Re: Re: Re: Re: Re: Re: Min/Avg/Med/Max/CoV in a single table with different date rollups
                                  Jonathan Drummey

                                  If I'm reading you correctly, #1 and #2 are both facets of the same feature. Tableau can nest table calculations, each table calculation in the nesting can have its own independent addressing, sorting, and partitioning.

                                   

                                  Total # of Days for Month has an Advanced Compute Using (addressing) of the MDY(Order Date):

                                   

                                  2013-09-04 13_59_38-Tableau - min-max-by-date-groupings-6 [Read-Only].png

                                   

                                  This calc is included in the Avg # of Orders calc, so when you are editing using the Edit Table Calculation dialog gets an extra drop-down to choose each calc so you can separately set the addressing & partitioning:

                                   

                                  2013-09-04 14_01_41-Tableau - min-max-by-date-groupings-6 [Read-Only].png

                                   

                                  Alternative explanation: In the Avg # of Orders calc, the FIRST(), TOTAL(), and WINDOW_SUM() table calcs all receive the addressing setting of the Avg # of Orders calc, while the inner [Total # of Days for Month] is a separate table calculation that can have its own compute using.

                                   

                                  So, to answer your question #1, the answer is yes, there are separate addressing and partitioning for each calc.

                                   

                                  For #2, I assume that you mean changing the "Compute Using" for the Avg # of Orders calc and not "Order By", there's not an Order By currently in the interface that I know of. This is where it gets a little tricky... If we use the context menu for the Avg # of Orders pill, we see this:

                                  2013-09-04 14_03_29-Tableau - min-max-by-date-groupings-6 [Read-Only].png

                                  Nothing is selected. There are two possible reasons for this: Either there is a nested table calculation which has a different addressing than the Avg # of Orders (which is the case here), or the Avg # of Orders is using an Advanced Compute Using. If you choose Order Date from this context menu, that sets the Compute Using for *both* the Avg # of Orders and the nested Total # of Days for Month calc. This causes the Total # of Days for Month calc to feed incorrect results into the Avg # of Orders Calc and your results go haywire.

                                   

                                  Does that answer your questions?

                                   

                                  Also, I've attached a revised workbook that uses a table calc filter (for non-Null values) and Measure Names/Values to generate the view.

                                   

                                  Jonathan

                                  1 of 1 people found this helpful