14 Replies Latest reply on Nov 26, 2018 1:56 PM by Joe Oppelt

    Date Issue Related to Messy Data

    David Swift

      I am working with SalesForce data related to leads and opportunities. I'd like to know how long each lead/opp spends in a given status (new, working, etc) but I have messy date.

       

      On a given day a sales person might change the status of field a bunch of time. What I'd like to do is look at only the first and last status on a given day. So I can use datediff to get the correct count of days the lead spend as given status. Thanks for any help you can give.

       

      As an example:

       

      Lead IDHistory IDDate
      Old ValueNew Value
      00100019/14/17NewWorking
      001000210/20/17WorkingNurturing
      001000310/20/17NurturingWorking
      001000410/20/17WorkingDrop
      001000510/20/17DropWorking
      001000610/20/17WorkingQualified
      001000710/20/17QualifiedWorking
      001000811/3/17WorkingQualified

       

      My example wasn't that clear:

       

      My goal is to calculate the average number of days a lead spends as a given status. In this example there is a lead create data of (as an example) 9/1/17 (which is on the lead creation table, not the lead history table). On 9/14 the sales person moved it from new to working. On the 10/20 for whatever reason the sales person changed the status a bunch but in the end it started the day as working and ended the day as working. On 11/3 it moved from working to qualified.

       

      So my results should be (approx)

      New: 14 days

      Working: 17 days

       

      All the trash on the 20th shouldn't count at all and what I really don't want is a bunch of half days mucking up my averages.

       

      Does that make more sense?

       

      I've updated with a sample data set (Please excuse my ignorance. I haven't posted before)

        • 1. Re: Date Issue Related to Messy Data
          Jim Dehner

          Good morning

          this

           

           

          set like this

          will return this

          Jim

          If this posts assists in resolving the question, please mark it helpful or as the 'correct answer' if it resolves the question. This will help other users find the same answer/resolution.  Thank you.

          • 2. Re: Date Issue Related to Messy Data
            Joe Oppelt

            Are you saying that for the rows on Oct 20, you want to identify the first row and the last row (I assume based on the HistoryID) ?

            • 3. Re: Date Issue Related to Messy Data
              David Swift

              My example wasn't that clear:

               

              My goal is to calculate the average number of days a lead spends as a given status. In this example there is a lead create data of (as an example) 9/1/17. On 9/14 the sales person moved it from new to working. On the 10/20 for whatever reason the sales person changed the status a bunch but in the end it started the day as working and ended the day as working. On 11/3 it moved from working to qualified.

               

              So my results should be (approx)

              New: 14 days

              Working: 17 days

               

              All the trash on the 20th shouldn't count at all and what I really don't want is a bunch of half days mucking up my averages.

               

              Does that make more sense?

              • 4. Re: Date Issue Related to Messy Data
                Joe Oppelt

                Here's the problem I see.

                 

                Your example has such limited data, that someone can provide a way for that one LeadID to arrive at the conclusion that 10/20 just ends up as "working" for the day.  What if the last entry for the day had a new value of "Qualified" or "Nurturing".  What do you expect to do with that?

                 

                I think my original question still stands.  You want to see that for 10/20 the first [Old Value] was "x" and the last [New Value] was "y", and based on that, you would then do some other step to weigh the old and new values against the prior and next dates.

                 

                You really need to provide a sample workbook with 5 or 6 different example cases so that the solution you get will be broad enough to handle all cases.  If proprietary data dictates that you can't upload your actual workbook (not just an excel file), then anonymize your data as demonstrated in the video attached here:

                 

                Video demonstrates how to anonymize your workbook/data

                • 5. Re: Date Issue Related to Messy Data
                  Joe Oppelt

                  We can use LODs to tell you what the [Old Value] on the first history record of 10/20 is, and ditto the [New Value] on the last history record, so you can have a clean Old-vs-New comparison for the day.

                  • 6. Re: Date Issue Related to Messy Data
                    David Swift

                    Thanks for your help. I have added a data set.

                    • 7. Re: Date Issue Related to Messy Data
                      Joe Oppelt

                      In the attached I did the following:

                       

                      First, because your change date field has a time stamp on it, we can use that instead of the history ID to identify the first and last values on any given day.


                      See [First Entry Per Day] and [Last Entry per Day] to see how I identify those entries.  For each Lead ID, for each DATE (without the time portion) I grab the MAX and MIN of all the rows in each chunk of rows.

                       

                      These are displayed on Sheet 1.

                       

                      On Sheet 2 I added two more fields.  See [Old Value at beginning of the day] and [New value at the end of the day].  For  the same chunk of rows, compare the date to the MIN (or MAX) field, and grab the respective old or new value.


                      I did this in 10.5.  I hope you're not in an older version.


                      This isn't the end of the work.  I'm stopping at this point to show you how to get those start and end values.

                      • 8. Re: Date Issue Related to Messy Data
                        Joe Oppelt

                        Special note:  Notice that on 11/3 you have two rows with the identical time stamp, (9:40:00) but with different Old- and New-values.  You're going to want to figure out if that's an anomaly or if this can actually happen in your data.  If so, you'll need to decide how to pick from among identical time-stamp rows to get the value you should actually work with.

                        1 of 1 people found this helpful
                        • 9. Re: Date Issue Related to Messy Data
                          David Swift

                          I follow so far. I appreciate the the help and I'll figure out the identical time stamp issue

                          • 10. Re: Date Issue Related to Messy Data
                            Joe Oppelt

                            Next steps.  In sheet 3 I removed a bunch of fields we no longer need.  (Compare it to sheet 2 to see that I didn't lost the Old Value and New Value calcs.)


                            also, I created a calc called Date of Change Date.  This strips off the time stamps, and I can use it more easily as a dimension.  (I swapped this into the 4 LODs where I built this value on the fly in the dimension list.)

                             

                            I added this to the sheet so that we get all the individual days sorted in date order.

                             

                            Next on Sheet 4 I added some table calcs.  INDEX helps me see that I'm walking the table in the order I want.  By default, TABLE(down) works for the way the sheet is set up, but you may not be using the dimensions on ROWS and/or COLUMNS, so I specified the dimensions in the table calc settings.  (All table calcs are using this setting.)

                             

                            Look at the [Days] calc.  If we're in the first row, we don't want to do any LOOKUP back one row, but instead use the [Created Date] as the comparison date.  For all other rows, look back one row to get the comparison date.

                             

                            I made a parallel table calc to grab the [Old Value at...] as the category for the number of days computed.  Notice that you have a "Nurturing" in the middle of your sample data for the selected LeadID.  I don't know if you will ever have that in real life, or if there is always a natural progression of categories.  But if your category can move backward and forward in the list of possible values, this will let you count those days.


                            So now you have a way to add up all the "Working" days.  WINDOW_SUM( IF [Category] = "Working" then [Days] END )

                             

                            I displayed that number in the title of the sheet.

                            • 11. Re: Date Issue Related to Messy Data
                              David Swift

                              Thanks so much for all your help and work! I really appreciate it

                              • 12. Re: Date Issue Related to Messy Data
                                David Swift

                                Joe,

                                 

                                I'm sorry to bother you but how can I determine the order if events have the same time stamp? Thanks.

                                • 13. Re: Date Issue Related to Messy Data
                                  Joe Oppelt

                                  There would have to be some way to tell in the History ID field which one came before the other.  But it looked to me like they were just random values.  If there were a sequential element to them, then you could create another LOD to find the earliest of the rows within a given [Last Entry Per Day] or [First Entry Per Day] (depending on which one you are messing with.)


                                  In the attached I made a [Tiebreaker] LOD for First Entry.  On sheet 1(5) I positioned the pills to show all the 11/3 first entry rows together.  (My mistake.  There are THREE of them!)  The tiebreaker calc grabs the min [History ID].  If there were a sequential order to these, the MIN would be the first one.  Because this is a string field, the MIN is the one that comes first alphabetically.

                                   

                                  With this, you could add the tiebreaker calc into the IF logic to grab the first one.  (When there is only one row for the time stamp, the tiebreaker is just that one row.)

                                  • 14. Re: Date Issue Related to Messy Data
                                    Joe Oppelt

                                    Of course, you would have to do the same with the Last Entry, doing a MAX tiebreaker using the same principles.