3 Replies Latest reply on Jan 27, 2017 11:59 AM by Dmitry Chirkov

    Tableau with Spark SQL on AWS EMR

    Daniel Yoo

      Tableau Desktop 9.3

      Spark 1.6.1 on AWS EMR

       

      I've successfully connected to the Spark SQL data source from Tableau desktop.

       

      I have a table created in default hive schema but this table does not appear in Tableau.

       

      If I instead do a custom SQL on the table: select * from default.<table_name>

       

      I can see that the data rows are retrieved. I've also verified the table on Hive shell.

       

      Even I do a search for the table with the exact, contains starts with options, it doesn't come up.

       

      Is this a known issue or is there a something I've missed to configure on Tableau/ODBC driver or Spark/Hive?

        • 1. Re: Tableau with Spark SQL on AWS EMR
          Dmitry Chirkov

          You might need to install/configure Hive.

          To enable access to a SchemaRDD, it needs to be registered in a catalog that is outside of just the local context.  Currently today, the Hive Metastore (“Hive context”) is the only supported service (the Spark SQL connector evolved out of the Hive connector) - thus the need for the Hive Thrift Server.

          • 2. Re: Tableau with Spark SQL on AWS EMR
            Adarsh Shekhar

            Hi Dimitry,

            Can you please point us to any documentation on how to 'register SchemaRDD in Hive Metastore'? I tried searching for it online but to no avail. I'd greatly appreciate your help.

            • 3. Re: Tableau with Spark SQL on AWS EMR
              Dmitry Chirkov

              I personally have to direct experience with that but here's something from our internal manuals:

               

              Tableau can use Initial SQL to create a temporary table to store the SchemaRDD and enable access without having to explicitly put the data HDFS and the Hive metastore. An example of this is below:

              create temporary table test                   

              using org.apache.spark.sql.json                 

              options (path ‘/data/json/*');

              cache table test;

              Then, Custom SQL can be used to explicitly display the temporary table that was created in Tableau:

              SELECT * FROM ‘test