6 Replies Latest reply on Jan 13, 2014 4:04 PM by Chris Smith

    How do I extract and display text from an organized stream of text (e.g., URL)?

    Chris Smith

      Hi everyone,

       

      Is there a way in Tableau to extract and display text from a URL? I am trying to do some analysis on URLs that don't always match in length (excel text to columns could work here if that was case); however, they do have a pattern and are set off with labels in the URL like the one below. The URL of this page in the forum is a great example:

       

      http://community.tableau.com/discussion/create!input.jspa?containerType=14&containerID=2003&draftID=91069

       

      What I would like to do in Tableau is create a rule for my URL that pulls out containerType, container ID and draft ID. Is there a way I can tell Tableau to find container type in all my URLs and show the numbers associated?

       

      At the end of the day, I would like to say that 55 people viewed container type 14 while 205 people viewed container type 25.

       

      Thanks for your help!

       

      Chris

        • 1. Re: How do I extract and display text from an organized stream of text (e.g., URL)?
          Jim Wahl

          Tableau has Excel-style string processing, so you could use a combination of MIDs, FINDs, ..., but it'd probably be much easier, faster and more robust to do this in the data source using regex expressions to create fields like Container ID, Container Type, Draft ID, ...

           

          If you're not familiar with regex, google "regex to parse url parameters".

           

          And you may want to vote for

          http://community.tableau.com/ideas/1258

          • 2. Re: How do I extract and display text from an organized stream of text (e.g., URL)?
            Chris Smith

            I am not sure how to write a regex expression for Tableau. Can you give me an example using the URL string above?

             

            Thanks for your help!

            • 3. Re: Re: How do I extract and display text from an organized stream of text (e.g., URL)?
              Jim Wahl

              Tableau doesn't support regular expressions (regex).

               

              If your URLs are consistent, it might be easiest to use Tableau's string processing. To find containerType in the url string above, you could create two fields:

               

              Location of Container Type =

              IF     FIND([URL],"&containerType=") > 0  // If it's not the first param, look for &
              THEN   FIND([URL],"&containerType=") + LEN("&containerType=")
              ELSEIF FIND([URL],"?containerType=") > 0  // If it's the first param, look for ?
              THEN   FIND([URL],"?containerType=") + LEN("?containerType=")
              ELSE   NULL // Return NULL instead of 0 if not found.
              END
              


              Container Type =

              MID(URL, 
                  [Location ContainerType Param], 
                  IIF(FIND(URL, "&", [Location ContainerType Param])>0,
                      FIND(URL, "&", [Location ContainerType Param]),
                      LEN(URL)+1)  // If it's the last param, the value won't be followed by a &. 
                  - [Location ContainerType Param]
              )
              

               

              You can see how this quickly gets complicated (and slow) with even a small number of variations. You may also need to use an extract instead of a live connection, because of limitations of the JET engine used to connect live to Excel, Access, and text data sources.

               

              Another approach is to use the Tableau R interface:

              Container Type =

              SCRIPT_REAL("require(httr)
                          parse_url(.arg1)$query$containerType", ATTR(URL))
              

               

              But this requires installing R, RServe, the httr package, ...

               

              And since the URL updates don't need to be live (the values don't change based on user input, filters, ...), it's probably better to do this either in the database (that supports regex) or external if you have a convenient data source (XLS or CSV). In addition to R, you could use Python. The Python equivalent of the R code is

              from urlparse import urlparse, parse_qs
              parse_qs(urlparse(url)[4])['containerType']
              

               

              The best approach depends on how often you'll be doing this, how much data you need to process, the consistency of the data, whether you need more than just the URL params, ...

               

              Jim

              • 4. Re: Re: How do I extract and display text from an organized stream of text (e.g., URL)?
                Chris Smith

                Hi Jim,

                 

                You are well ahead of me in Tableau technical know-how. Do you have any example workbooks that use this methodology (that's how I learn best, by copying).

                 

                My urls look like this

                 

                website.com/domain/orderpage.html?productcat=shoes&brand=nike&shoenumber=12345&color=blue

                 

                I want to somehow extract the data for product category (shoes), brand (nike), shoe number (12345), and color blue.

                 

                Thanks

                Chris

                • 5. Re: Re: Re: How do I extract and display text from an organized stream of text (e.g., URL)?
                  Jim Wahl

                  Here's a version using the Tableau string parsing strategy above for your URL param.

                   

                  A couple of notes

                  - if you get a "Query too complex" use an extract.

                  - if you're using a proper database, consider doing this in the DB.

                  - if not and, but you're processing lots of data, consider using a Python script to add columns to the data source, prior to loading in Tableau.

                  • 6. Re: How do I extract and display text from an organized stream of text (e.g., URL)?
                    Chris Smith

                    This solved it - you're one smart guy, Jim. Thanks!