1 Reply Latest reply on Apr 23, 2018 11:49 AM by Patrick A Van Der Hyde

    Hive UDF support in Tableau

    Meenakshi Zutshi

      I created a Hive function using Hive UDF and added the jar to Initial SQL setup in Tableau. Following are the steps:

      Pardon me for the random names used below, hope this makes sense.

       

      1. Java code with Hive UDF

      2. Added jar to a hdfs path

      3. Created Hive function -> create function name as 'java file name extending UDF' using 'hdfs jar location';

      4. Tested this function in Hive console on HDFS

      5. Created a view - which uses the UDF on top of a column name.

      6. For support in Tableau - used add jar in Initial SQL -> add JAR 'hds path location of the jar'

      7. Connection is established

      8 But when I run a simple select query on the view -> which has a select statement using the function created through Hive UDF

       

        It connect and shows the output, but the function doesn't seem to work, it just gives null value for the columns where UDF was applied.

       

      9. Why Tableau doesn't convert the value of the column as expected but instead just shows null for all the values for the column in the view?

       

      10. When I run same thing in Hive console it works fine and the column value is converted as expected.

       

      Please share if there are any additional steps needed in the setup for supporting Hive in Tableau for executing a UDF. Or If I am missing something.

        • 1. Re: Hive UDF support in Tableau
          Patrick A Van Der Hyde

          Hello Meenakshi,

           

          Have you reviewed the rest of the connection setup instructions here: http://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#examples_hadoop.html%3FTocPath%3DReference%7CConnector%…

           

          Here is the referenced portion of the outdated KB article on this topic:

           

          Initial SQL

          Tableau supports initial SQL for Hadoop Hive connections, which allows you to define a collection of SQL statements to perform immediately after the connection is established. For example, you can set Hive and Hadoop configuration variables for a given connection from Tableau to tune performance characteristics. Refer to the Designing for Performance article for more information. You can also register custom UDFs as scripts, JAR files, etc., that reside on the Hadoop cluster. Registering these allows you, other developers, and analysts to collaborate on developing custom data processing logic and quickly incorporating that into Tableau views.

          Because initial SQL supports arbitrary Hive query statements, you can use Hive to accomplish a variety of interesting tasks when connecting to Tableau.

          Custom analysis with UDFs and Map/Reduce

          Although Hive offers additional UDFs that Tableau does not yet support as functions for you to use in calculated fields, Tableau does offer "Pass Through" functions for using UDFs, UDAFs (for aggregation) and arbitrary SQL expressions in the SELECT list. For example, to determine the co-variance between two fields 'f1' and 'f2', the following Tableau calculated field takes advantage of a UDAF in Hive: RAWSQLAGG_REAL("covar_pop(%1, %2)", [f1], [f2])

          Similarly, Tableau allows you to take advantage of custom UDFs and UDAFs built by the Hadoop community or by your own development team. Often these are built as JAR files that Hadoop can easily copy across the cluster to support distributed computation. To take advantage of JAR files or scripts, inform Hive of the location of these files and Hive will take care of the rest.

          Note: You can also do this with Initial SQLwith one or more SQL statements separated by semicolons:

          add JAR /usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u1.jar;
          add FILE /mnt/hive_backlink_mapper.py;

          For more information, refer to the Hive language manual section on CLI: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveResources.

          Hive supports explicit control over how to perform the Map/Reduce operations. While Tableau allows you to perform sophisticated analysis without having to learn the Hive query language, as an advanced Hive and Hadoop user, you can take full advantage of this knowledge in Tableau. Using Custom SQL, you can define arbitrary Hive query expressions, including the MAP, REDUCE, and TRANSFORM operators described in the Hive language manual. As with custom UDFs, using custom transform scripts may require you to register the location of those scripts using Initial SQL.

          Refer to the following blog for an interesting example of using custom scripts and explicit Map/Reduce transforms: http://www.cloudera.com/blog/2009/09/grouping-related-trends-with-hadoop.

           

          Patrick