1 Reply Latest reply on Jul 24, 2018 10:30 AM by Brian Lipp

    EMR sparksql connection issues

    Brian Lipp

      I have a client that wants to use Tabeau their EMR (spark) cluster. I have used Tabeau for other purposes, but not in this case. The documentation seems straightforward, but I'm getting errors when I try to connect.

      Here is the setup:

       

      1. EMR cluster's master doesn't have a public IP, but from the Tableau desktop EC2 instance I am able to ping and telnet into the port 10001 where thift is running.

      2. I am able to test thrift with beeline and it connects fine

      3. I am not using ssl or authentication given the limit access the cluster has.

      4. I have installed both data direct 8.0 and simbaodbc and I'm using

      Release label:emr-5.13.0

      Hadoop distribution:Amazon 2.8.3

      Applications:Spark 2.3.0

       

      The error is

      "Unable to connect to the ODBC Data Source. Check that the necessary drivers are installed and that the connection properties are valid.

      [Simba][ThriftExtension] (5) Error occurred while contacting server: No more data to read.. This could be because you are trying to establish a non-SSL connection to an SSL-enabled server.

      Unable to connect to the server "IP". Check that the server is running and that you have access privileges to the requested database."

       

      I simply followed the documentation provided by tableau which says to install the driver only (not mess with odbc), then us it in tableau. I have verified that I have set no ssl, no auth, before trying to connect.I also verified by running datagrip and doing a query from the tableau ec2 instance, which works as expected.