1 Reply Latest reply on Aug 15, 2013 8:32 AM by Pedro Machado

    Bucketing Date Comparison

    Chris Tsui

      This is a question that has stemmed from a previous project I was working on, and I'm now running into more issues which I'm hoping I can gain some help from the community from.

       

      I am attempting to categorize a set of date comparisons and run a count on the rows within each bucket.

       

      A Sample of data is attached, and a desired result I've mocked up in the second tab.

       

      Given some of the "oddities" in the data I have one column which has a date field, one column that describes that date field (Preliminary/Secondary Meeting), a third column that shows a second date field for an exit meeting.

       

      A standard bucket would mean that a client would have a preliminary meeting (one or more) then move into having one or more secondary meetings, and if required an exit meeting.

       

      A regression bucket is when a secondary meeting occurs AFTER an exit meeting

       

      An advanced bucket is when a secondary meeting occurs BEFORE a preliminary meeting.

       

      There are also instances when a client may only have one types of meetings, in which case it would be classified as Standard.

       

      Doing a Min/Max calcualtion using one date field (and a Tableau KB about using a reference date) served my purpose of finding date differences of the same column, however when I start to bucket it gets a little wonky.

       

      I would appreciate any insight and suggestions as to how I would tackle this if possible!

       

      Much appreciated as my brain is nearing exploding point on this one

        • 1. Re: Bucketing Date Comparison
          Pedro Machado

          Chris,

           

          My suggestion would be to transform the data via custom SQL so that you have a clean mapping of client id to Bucket and only one record per ClientID.

           

          I tried to write the custom SQL and then realized that the database engine that reads the excel file does not support CASE statements. Instead, you have to use the IFF function. I started to do it, but it was taking too long (and the code is messy).

           

          Below is what I am suggesting you do. This should run in Oracle or PostgresSQL, but I haven't tested it. I think I implemented the logic you provided correctly, but I am not sure. In any case, I hope this helps:

           

          SELECT ClientID,

                 CASE WHEN max( CASE WHEN Service = "Secondary"  ServiceDate END ) >

                          min( ExitDate )

                      THEN "Regression"

                      WHEN max( CASE WHEN Service = "Secondary"   THEN ServiceDate END ) <

                           max( CASE WHEN Service = "Preliminary" THEN ExitDate    END )

                      THEN "Advanced"

                      WHEN max( CASE WHEN Service = "Preliminary" THEN 1 ELSE 0 END ) = 1 and

                           max( CASE WHEN Service = "Secondary"   THEN 1 ELSE 0 END ) = 1

                      THEN "Standard"

                      WHEN max( CASE WHEN  Service = "Preliminary" THEN 1 ELSE 0 END ) = 1 and

                           max( CASE WHEN  Service = "Secondary"   THEN 1 ELSE 0 END ) = 0

                      THEN "Standard"

                      WHEN max( CASE WHEN  Service = "Preliminary" THEN 1 ELSE 0 END ) = 0 and

                           max( CASE WHEN  Service = "Secondary"   THEN 1 ELSE 0 END ) = 1

                      THEN "Standard"

                      ELSE "?"

                  END

          FROM yourtable

          GROUP BY ClientID ;


          Pedro