1 2 Previous Next 18 Replies Latest reply on May 21, 2014 12:42 AM by vijay reddy

    extract data from xml field using xpath functions

    E Rayle

      I have a field that holds extra data as simple attr-value pairs in XML.  I want to be able to create a calculated column that pulls out a specific attribute and creates a column with its value.

       

      Searching Tableau's site, I ran across two references that look promising, but lack detail.

       

      1. http://kb.tableausoftware.com/articles/knowledgebase/cloudera-hadoop-hive-extra-capabilities   Tableau provides a number of functions for processing XML data, which allows you to extract content, perform analysis or computation, and filter the XML data. These functions leverage XPath
      2. http://www.tableausoftware.com/solutions/hadoop   Google has the following text for this page:  "Hadoop | Tableau Software  www.tableausoftware.com › Solutions   Here we're using the “XPATH” function to create a City field so that we can represent this data in a more traditional, relational way. XML functions are exposed in ..." 

       

      The 2nd item is exactly what I want to do, but unfortunately, the text shown on the Google search does not show up on the page at Tableau.  And while trying to add a calculated column, I did not see an XPATH function on the list of functions.

       

      Is there an XPATH function?  If so, can someone point me to the documentation? 

       

      Your help is appreciated,

      EL

        • 1. Re: extract data from xml field using xpath functions
          Shawn Wallwork

          E, I don't know if this will help much, but if you go to the third tab you'll find the Google results:

          XPATH.PNG

          Pinging Tamas

           

          --Shawn

          1 of 1 people found this helpful
          • 2. Re: extract data from xml field using xpath functions
            E Rayle

            Hi Shawn,

             

            Thanks for the quick answer.  This looks so promising, but when I try to use the XPATH_STRING function, the calculated field validation gets an error "Unknown function XPATH_STRING called."

             

            Anyone know if XPATH_STRING function should exist? Or is there a way for me to get it added to my Tableau?

             

            EL

            • 3. Re: extract data from xml field using xpath functions
              Tamas Foldi

              As far as I know the XPATH functions are only available for hadoop data sources in Tableau.

               

              Rayle: what is your data source? Is it some kind of Hadoop or something different?

               

              If you have Oracle or other "smart" DBMS then you can still use RAWSQL

              1 of 1 people found this helpful
              • 4. Re: extract data from xml field using xpath functions
                E Rayle

                Hi Tamas,

                 

                Thanks for your suggestions.  I am using MySQL, so I think your suggestion of RAWSQL is right on track. 

                 

                I am having problems getting RAWSQL to work.

                 

                Formula #1:   RAWSQL_STR("ExtractValue( attribute_xml, '//attributes/a[19]' )")

                Formula #2:   RAWSQL_STR("ExtractValue( project_event.attribute_xml, '//attributes/a[19]' )")

                 

                Error:

                rawsql_str_xpath_errormsg.png

                 

                Debugging:

                * I tested both versions of the formula using a SQL query in phpMyAdmin.  Both worked. 

                * I did copy and paste to get the working versions into Tableau with confidence that there wasn't a typo.

                * I get the same error seen above for both formulas, except for #1 the Unknown column is 'attribute_xml' instead of 'project_event.attribute_xml'

                 

                I wasn't able to find any good examples, so I'm hoping that I am doing something obviously wrong that is easy to fix.  Any suggestions are very much appreciated.

                 

                EL

                • 5. Re: extract data from xml field using xpath functions
                  Tamas Foldi

                  attribute_xml is your XML column? Then try something like:

                   

                    RAWSQL_STR("ExtractValue( '%1', '//attributes/a[19]' )", [attribute_xml] )

                   

                  The first one is the column select, the second one is the attribute to substitute. Check KB http://onlinehelp.tableausoftware.com/current/pro/online/en-us/functions_functions_passthrough.html and have a look in %USERHOME%\Documents\My Tableau Repository\Logs\log.txt where you can see what SQLs were posted to MySQL

                  1 of 1 people found this helpful
                  • 6. Re: extract data from xml field using xpath functions
                    E Rayle

                    Hi Tamas,

                     

                    Pointing me to the log was very helpful.  I have been playing around with the SQL that Tableau generates and submitting it directly through phpMyAdmin.

                     

                    In case others are following this thread, here is what I have found so far...

                     

                    Context for the examples:

                         Table name:  

                                project_event

                         Column names:

                                id

                                attribute_xml

                     

                    Formula #1:  RAWSQL_STR("ExtractValue(project_event.attribute_xml,'//attributes/a[@n=\'event-interest\']')")

                     

                    Generates SQL:

                    SELECT (ExtractValue(project_event.attribute_xml,'//attributes/a[@n=\'event-interest\']')) AS `Calculation_2860104083425346`,

                      1 AS `Number of Records`,

                      SUBSTRING(`project_event`.`attribute_xml`, 1, 1024) AS `attribute_xml`,

                      `project_event`.`id` AS `project_event_id`

                    FROM `project_event`

                    LIMIT 10000

                     

                    In Tableau, field Calculation_2860104083425346 = #Error

                    In phpMyAdmin, submitting this same query field Calculation_2860104083425346 = 'SIGNAL-OF-INTEREST'  (the correct value)

                     

                    Formula #2:  RAWSQL_STR("ExtractValue(%1,'//attributes/a[@n=\'event-interest\']')",[attribute_xml])

                     

                    Generates SQL:

                    SELECT (ExtractValue(SUBSTRING(`project_event`.`attribute_xml`, 1, 1024),'//attributes/a[@n=\'event-interest\']')) AS `Calculation_2860104083425346`,

                      1 AS `Number of Records`,

                      SUBSTRING(`project_event`.`attribute_xml`, 1, 1024) AS `attribute_xml`,

                      `project_event`.`id` AS `project_event_id`

                    FROM `project_event`

                    LIMIT 10000

                     

                    In Tableau, field Calculation_2860104083425346 = #Error

                    In phpMyAdmin, submitting this same query field Calculation_2860104083425346 = NULL  (incorrect value)

                     

                    I am still exploring a solution.

                     

                    EL

                    • 7. Re: extract data from xml field using xpath functions
                      Tamas Foldi

                      Post some records from your source data set and the expected results and when I have some time I will have a look on that too

                      • 8. Re: extract data from xml field using xpath functions
                        E Rayle

                        I greatly simplified the data so it is boiled down to the essence of the problem.  I confirmed that I get the same behavior with this data. 

                         

                        Attached file creates DB with...

                        • 10 records
                        • 2 fields:  id, attribute_xml
                        • attribute_xml has 3 attributes: 

                        <attributes>

                          <a n="event-interest">SIGNAL-NOT-OF-INTEREST</a>

                          <a n="event-type">BLOCK</a>

                          <a n="note"></a>

                        </attributes>

                         

                        • All values for attribute_xml are the same except that some have event-interest=SIGNAL-OF-INTEREST and others have event-interest=SIGNAL-NOT-OF-INTEREST.

                         

                        In Tableau, my test procedure is...

                        • Formula interest-1:    RAWSQL_STR("ExtractValue(project_event.attribute_xml,'//attributes/a[@n=\'event-interest\']')")
                        • Formula interest-2:    RAWSQL_STR("ExtractValue(%1,'//attributes/a[@n=\'event-interest\']')",[attribute_xml])
                        • From menus:  Data -> project_event (test) -> View Data...
                        • Copy View Data query from log
                          • edit "Tableau Repository\Logs\log.txt"
                          • go to end of log.txt
                          • copy last query from log

                         

                        Results in Tableau:

                        • interest-1 - all values are #Error
                        • interest-2 - all values are #Error

                         

                        Results in MyPhpAdmin:

                        • Query from log:

                        SELECT (ExtractValue(SUBSTRING(`project_event`.`attribute_xml`, 1, 1024),'//attributes/a[@n=\'event-interest\']')) AS `Calculation_5780107105538043`,

                          (ExtractValue(project_event.attribute_xml,'//attributes/a[@n=\'event-interest\']')) AS `Calculation_6930107103747466`,

                          1 AS `Number of Records`,

                          SUBSTRING(`project_event`.`attribute_xml`, 1, 1024) AS `attribute_xml`,

                          `project_event`.`id` AS `id`

                        FROM `project_event`

                        LIMIT 10000

                         

                        • Both calculated fields have the correct value of either SIGNAL-OF-INTEREST or SIGNAL-NOT-OF-INTEREST.  (This is different than what I got before where one formula got the correct value and one got NULL.)

                         

                        Full Results:

                         

                        xpath_query_results.PNG

                        • 10. Re: extract data from xml field using xpath functions
                          Tamas Foldi

                          Sorry for not responding, but the first month of the year is always makes me too busy.

                           

                          So. It is a bug in Tableau mysql engine, as it cannot obtain the calculated type from mysql. The solution is to use

                           

                          STR( RAWSQL_STR("ExtractValue(%1,'//attributes/a[@n=\'event-interest\']')",[attribute_xml]))

                           

                          Instead of

                           

                          RAWSQL_STR("ExtractValue(%1,'//attributes/a[@n=\'event-interest\']')",[attribute_xml])

                           

                          This is a generic principle, if you found #Error in calculated fields, but other fields are OK, then it is a metadata related issue. Try to convert the result explicitly to STR or INT.

                           

                          xml issue.png

                          1 of 1 people found this helpful
                          • 11. Re: extract data from xml field using xpath functions
                            E Rayle

                            Ureka!!!  That worked Tomas.  Thanks so much for helping to resolve this.

                             

                            Ok, well mostly.  Here are a few remaining observations.

                             

                            Formula #1:

                             

                                 STR( RAWSQL_STR("ExtractValue(project_event.attribute_xml,'//attributes/a[@n=\'event-interest\']')"))

                             

                                 Approach:   hardcoded table.column name

                                 Results:  WORKS

                             

                            Formula #2:

                             

                                 STR( RAWSQL_STR("ExtractValue(%1,'//attributes/a[@n=\'event-interest\']')",[attribute_xml]))

                             

                                 Approach:  allowing Tableau to insert the table.column name using %1 reference

                                 Results:  STR works;  Query does not

                              • Query results in calculated value being Null
                              • STR correctly displays Null as the value in Tableau

                             

                            Generated SQL: 

                             

                            Generated SQL was extracted from Tableau Repository\Logs\log.txt.  Below is only the part of the SQL that pertains to the calculated formulas.

                             

                            Formula #1:

                             

                            SELECT SUBSTRING((ExtractValue(project_event.attribute_xml,'//attributes/a[@n=\'event-interest\']')), 1, 1024) AS `Calculation_2860104083425346`,

                             

                            Formula #2:

                             

                            SELECT SUBSTRING((ExtractValue(SUBSTRING(`project_event`.`attribute_xml`, 1, 1024),'//attributes/a[@n=\'interest\']')), 1, 1024) AS `Calculation_2860104083425346`,

                             

                            The differences are in bold and deal with how the table.column name is included in the SQL.  The first results in the value correctly being extracted from the XML.  The second results in a Null value.

                             

                            Thanks again to everyone who helped resolve this issue!

                             

                            EL

                            • 12. Re: extract data from xml field using xpath functions
                              E Rayle

                              So close, but graphs seem inconsistent, unless I am setting up the graphs or interpreting them incorrectly.

                               

                              When I view data, the values look good with some having SIGNAL-OF-INTEREST and others having SIGNAL-NOT-OF-INTEREST

                               

                              view data GOOD.PNG

                               

                              If I graph event-interest (column) and event_id (row), I see both values represented in the columns...

                               

                              col event-interest row event_id GOOD.PNG

                               

                              But when I try to make the graph I'm really interested in, it appears that all values are counted as SIGNAL-OF-INTEREST...

                               

                              col event-interest row sum_records BAD.PNG

                               

                              I expected there to be two columns, one for SIGNAL-OF-INTEREST and one for SIGNAL-NOT-OF-INTEREST each showing the count of records for the respective values.

                               

                              I don't have any filters applied.  The only thing I set is the column and row.  Any suggestions why this would happen? 

                               

                              EL

                              • 13. Re: extract data from xml field using xpath functions
                                Tamas Foldi

                                Please attach a packaged workbook, thanks

                                • 14. Re: extract data from xml field using xpath functions
                                  E Rayle

                                  I can save a Packaged Workbook, but it of course wants my data to go along with it.  When I save my data as a Tableau Data Extract, I no longer have RAWSQL_STR as a valid function, so I cannot create the calculated field.

                                   

                                  Would you like me to send you the Package Workbook with the calculated field but without the Data Extract?  The project_event.sql.zip file attached previously is the same data that was used with this workbook.

                                   

                                  There is no way for you to connect directly to my data.  It is on a localhost test environment.

                                  1 2 Previous Next