8 Replies Latest reply on Dec 12, 2013 10:05 AM by Shawn Wallwork

    One-to-Many Challenge w/ Date Wrinkle

    Shawn Wallwork

      Basic Info

      8.1 packaged workbook attached.

      Excel file with sample data attached.

      Desired results mocked up in Excel file.

       

      Real world data is 2 tables on MS SQL Server, approx. 7 mil records.

      Real world there are 4-6 additional fields in the Timesheet Table that will need the same calcs/aggregations

      Real world the table will run over multiple years and show all months

       

      I'm hoping for:

      1. A Tableau only solution
      2. A MS SQL solution

       

       

       

      Description

       

      In the Excel file you will find two tables:

       

      Timesheet-1.pngTimesheet-6.png

       

      And the desired result:

       

      Timesheet-3.pngTimesheet-4.png

       

      The last sheet just shows how I calculated the solution by hand, which might help you understand the issue.

       

       

      On the Table Layout tab in the packaged workbook you will see how the data needs to be presented:

       

      Timesheet-5.png

       

      Ignore the figures, they are only included to demonstrate the way the totals need to work. The solution will replace [Number of Records] with a correctly calculated Hours field, and will produce the same values as are in the Excel file.

       

      Any help will be greatly appreciated. Thank you.

       

      Cheers,

       

      --Shawn

       

      EDIT: The start of the week for January should be 1/27/2013.

      EDIT 2: I've attached another version of the Excel file. Cell B4 on the Timesheet tab should be 56 not 40.

      [Creating sample data is a whole lot harder than it looks.]

        • 1. Re: One-to-Many Challenge w/ Date Wrinkle
          Jonathan Drummey

          How are those First Day's determined?

          1 of 1 people found this helpful
          • 2. Re: One-to-Many Challenge w/ Date Wrinkle
            Shawn Wallwork

            Yeah, I didn't do such a great job mocking up the data, but...

             

            First days should be the first day of the week (Sunday) unless it crosses over into the next month, then it is the first day of the month. So to help clarify here are all the first day for the first half of 2013:

             

            First Days.png

             

            The yellow highlight shows the limited sample data. Note: these are British date; in the sample data I posted American dates.

             

             

            --Shawn

            • 3. Re: One-to-Many Challenge w/ Date Wrinkle
              Jonathan Drummey

              Hi Shawn,

               

              To me this smells like a bank balance scenario. At a high level, there's something that happened in the past, some complex-ish rules that apply to the row-level data at a slightly higher level of aggregation (such as computing an end-of-day balance or the weekly contribution of hours), and then that data *does not change*. Now, we can do these in Tableau, but that means re-computing all the balances or hours *every time*. Which in this case for millions of records in a year means doing some set of computations that could take a *very long time*. That's why banks have batch processes to update the data and then they can run on that. Something like that here could work as well on the SQL side.

               

              In terms of the table calcs, I'm working out an error in needing two TOTAL(COUNTD()) functions. Is Tableau 8.1 available? (that might make life easier), I can get a table calc solution for the detail rows, geting the subtotals and totals to work will be a mess (if it's at all possible), and scaling up probably won't work.

               

              There's an intermediate table calc solution that I causing a third database table that would be a dimension table having each a row for each date with the first and last date of week & # of days in the week, and a self-blend that would get the # of days worked. Then the calcs could be mostly (entirely?) regular aggregate calcs and might be fast enough to work, I think I'll let go of the table calc solution and try this one.

               

              Jonathan

              1 of 1 people found this helpful
              • 4. Re: One-to-Many Challenge w/ Date Wrinkle
                Shawn Wallwork

                Jonathan, thanks for giving this your time/attention -- greatly appreciated. I don't know anything about 'smells' so I'll move onto more specifics...

                 

                Yes! We can most definitely do this in 8.1

                 

                "...getting the subtotals and totals to work will be a mess (if it's at all possible)..."

                 

                [Yeah that was sort of my whole point in a recent post I made on another thread.]

                 

                "There's an intermediate table calc solution that I causing a third database table that would be a dimension table having each a row for each date with the first and last date of week & # of days in the week, and a self-blend that would get the # of days worked. Then the calcs could be mostly (entirely?) regular aggregate calcs and might be fast enough to work, I think I'll let go of the table calc solution and try this one."

                 

                This actually sounds both promising and doable. I can now create 'views' in the MS SQL database; not sure about tables (or how they will flow through, as to live data).

                 

                Personal note: I've been beating my head against this wall for 4 days solid. Thinking this should be easy; I should know how to do this. Discovering, not so much....

                 

                --Shawn

                • 5. Re: One-to-Many Challenge w/ Date Wrinkle
                  Joe Mako

                  A way to compute 'First day' based on your logic as I currently understand it is:

                   

                  DATE(MAX(DATETRUNC('week',[Date]),DATETRUNC('month',[Date])))

                   

                  This uses the alternative form of the MAX() function. Normally the MAX() function is an aggregation, because we normally pass it a single argument. It has an alternate form with two arguments, that is a non-aggregate function, a kind of logical function, that returns the max value of the two arguments.

                   

                  The main question I have is on the contents of 'Schedule Table'. Currently there are not duplicate combinations of 'TimesheetID'-'Date'. In your real data, can there be multiple occurrences of of these combinations?

                   

                  For example, could you real data contain something like:

                  ScheduleIDTimesheetIDDate
                  1A1/28/13
                  2A1/28/13
                  3A1/29/13
                  4B1/30/13
                  5B1/30/13
                  6B1/31/13

                   

                  Notice that the ScheduleIDs 1 and 2 have the same contents, along with 4 and 5. Is this possible in your real data?

                  1 of 1 people found this helpful
                  • 6. Re: One-to-Many Challenge w/ Date Wrinkle
                    Shawn Wallwork

                    No. ScheduleID will always be different. It is tantamount to Record ID and produces the finest granularity.

                     

                    CLARIFYING EDIT:

                     

                    "For example, could you real data contain something like:

                    ScheduleIDTimesheetIDDate
                    1A1/28/13
                    2A1/28/13
                    3A1/29/13
                    4B1/30/13
                    5B1/30/13
                    6B1/31/13

                    "

                     

                    No this could not occur, because B can't work on 1/30 twice. A single date can only occur in a timesheet once. So B would not have a 4 and 5 record because they are on the same day.

                     

                    Does this help?

                     

                    --Shawn

                     

                    EDIT: Reading more carefully, no SchedID 1&2 would not occur in the actual data.

                    • 7. Re: Re: One-to-Many Challenge w/ Date Wrinkle
                      Joe Mako

                      Great!

                       

                      That means you have lots of options and attached are three routes, all applying the same logic:

                      1. Join and Table Cable Calculations, using your supplied connection and a series of table calculations

                      2. Data Blend with Table Calculation, using your raw data sources, and a single table calculation

                      3. Custom SQL, and a simple calculation

                       

                      The key takeaway is there are multiple ways to get to the same result, the question is, what method has the best performance for your situation, and where are you comfortable with complexity.

                       

                      Take your pick:

                      1. Nested table calculations, each with their own compute using settings, and countermeasures for data densification

                      2. Self Data Blend for selected Dimensions, and a simple table calc to roll the data up

                      3. Subqueries in Custom SQL

                       

                      For myself, the table calculation route is fun, but it is fragile. I would default to the Data Blend as my first choice for rapid development and ease of adding additional logic. Then if I needed something with better performance, I would then move to the Custom SQL option (with an extract of course) once I know the analysis that I wanted to perform. I find it is easy to transition from the Data Blend route to the Custom SQL route because Tableau has written all the SQL, we would just need to copy-paste and wrap in LEFT JOIN statements. Let me know if you would like to join me for a screen sharing session, and I would be glad to walk you through the process of transitioning from a Data Blend to a Custom SQL

                       

                      No doubt, there are other routes to get these same results, and based on other constraints that you may have, these three proposed routes may not be ideal in all situations.

                      • 8. Re: Re: One-to-Many Challenge w/ Date Wrinkle
                        Shawn Wallwork

                        Excellent Joe, and thanks for the screen-share session!

                         

                        For anyone interested, the Custom SQL solution performed significantly faster than the data blend solution (both using live connections, which the client is insisting on).

                         

                        --Shawn