3 Replies Latest reply on Mar 15, 2017 10:01 AM by Tom W

    Extracting dates from inconsistent URL strings

    Louis Nicholson

      Hi,

       

      I'm working on building different views and Filters for Search URLs and Page Views (it's Google Analytics Page Data).

       

      Attached is a packaged workbook which shows the type of URLs I'm working with (where I pull in everything starting with "search"), and what I've built so far (I've built a Filter for Destination and one for Date - which picks up "main_date:" in the URL and I've built a Calculated Field which references what to show it as).

       

      But what I really want to do is build a Filter which also picks up "date:during|" in the URL and then extracts any of the dates immediately after.

       

      The difference between "main_date:" and "date:during|" is that main date picks up YYYY-MM in the same format (so it's easy to predict and account for), but "date:during|" picks up YYYY-MM-DD (not always in a double digit format for Days).

       

      There are several problems with this URL data:

       

      1) "date:during|" does not appear in every URL and in some, appears multiple times (the multiple options reflect multiple choices for departure dates - i.e. "main_date" is just one YYYY-MM but "date:during|" can return multiple records

      2) The formatting of this URL data is not always consistent

      3) The positioning of "date:during|" changes between URLs - dependent upon other Search URLs/configurations

       

      What I want to achieve is to pull out any instances of "date:during|" and then extract the dates immediately thereafter (but just the dates and nothing else). For multiple entries of "date:during|" I'd be happy to count each instance (i.e. for the same URL, each "date:during|" can equate one Page View).

       

      Once I've got these dates and the number of times search combinations occurred, I could set up some kind of visualisation which shows search patterns for each website and market etc. etc.

       

      Does anyone have any thoughts on how to go about building the Calculated Fields for this?

       

      I'd usually be more comfortable cleaning the URL data and building in Excel, but this is something I need to automate and so ideally, Tableau will be the way forward.

       

      Many thanks

        • 1. Re: Extracting dates from inconsistent URL strings
          Tom W

          So I'm not entirely sure if I've captured what you need help with, so I'm going to start with a small piece.

           

          You can extract the date which follows date:during in the string as follows:

          Note that I've created a number of calculated fields here to help reduce repetition and make it a bit easier to follow:

           

          DtDuringStart = FIND([pagePath],"date:during|")+12

           

          DtDuringLength =

          IF FIND([pagePath],"&",[DtDuringStart])=0

              then

                  //There's no & in the string after 'date:during', which means we should take the rest of the string.

                  //20 characters is probably overkill but I had to pick something

                  20

          else

                  //Found an &, calculate the dynamic length of the date based on the starting position and the &

          FIND([pagePath],"&",[DtDuringStart])-[DtDuringStart]

          end

           

          DtDuringExtract = IF FIND([pagePath],"date:during|")>0 then MID([pagePath],[DtDuringStart],[DtDuringLength]) end

           

           

          If you pull DtDuringExtract into your report now you'll see you have a number of different values coming through i.e. 2017-04, 2017-04-1, 2017-04-10

           

          Using DATEPARSE the only one which is a problem is the 2017-04, all those other dates can be treated the same regardless of the 1 or 2 digit day.

           

          This calc will parse out the date for you:

           

          IF len([DtDuringExtract])=7 THEN

              //Len 7, i.e. 2017-04

              DATEPARSE("yyyy-MM",[DtDuringExtract])

          ELSE

              DATEPARSE("yyyy-MM-dd",[DtDuringExtract])

          END

           

           

          The only other thing worth mentioning is you did state there can sometimes be multiple Date During instances in the same row. This example will only parse out the first one. If you need the second one, it gets tricky. You would either need a second set of calcs to parse out 'second date during', but that makes it pretty useless for analysis unless you specifically want to look at second date during versus first.

          1 of 1 people found this helpful
          • 2. Re: Extracting dates from inconsistent URL strings
            Louis Nicholson

            Hi Tom,

             

            Thank you for your help on this. I can confirm that this is exactly what I'm after in terms of displaying search dates.

             

            I'm going to play with these calculations for a bit and see what visualisations I can conjure up, but I just want to say a big thanks for those calculations - I think it's the start I need in pulling together what I need.

             

            Many thanks,

             

            Louis

            • 3. Re: Extracting dates from inconsistent URL strings
              Tom W

              No problems! Please mark the answer as correct in order to close the thread.