High Throughput Google BigQuery extracts with Simba JDBC driver

Version 6

    This document describes the steps to configure Tableau to generate Hyper Extracts at about 2.5 faster the speed of the Tableau's native Google BigQuery connector (based on my initial tests), by leveraging SIMBA's JDBC for BigQuery.  The reason for better performance is due to Simba Driver's support for the newly released Google High Throughput API (aka Google BigQuery Storage API). In addition, this method allows Tableau customers create extracts with GCP Service Accounts.

     

    BEWARE: Due to incompatibility with SQL dialects, this JDBC connection will not work for regular ad-hoc queries with Tableau, users will experience a number of CAST errors. But it's a perfectly usable solution for taking simple Extracts (e.g. Select * from MyBigqueryTableau). I.e. This solution is only suitable for generating Extracts!

     

    Test this solution and send me feedback please! I'd appreciate comments to let me know if it worked well for you, if performance was also improved by a factor of ~2.5 times (or worse/better), if compared with the native connector, any further optimisations you recommend, etc

     

    Let's start! First things first. Setting up the accounts and environment:

     

    0 - All tests made with Tableau Desktop and Server 2019.2.0 . I highly recommend you only try this on this or newer product versions!

     

    1 - Enable Google Storage API: Enabling and Disabling APIs  |  Cloud APIs  |  Google Cloud

    At the time of writing the article, the cost for using this Storage API is $1.10 per TB read

     

    2 - Create a GCP Service Account here: https://console.cloud.google.com/iam-admin/serviceaccounts  . We will generate Tableau Hyper Extracts with this account. And make sure to set appropriate permissions for this account here: https://console.cloud.google.com/iam-admin/iam  , so it can read data from Google Big query. In my case, I have the following permissions enabled:

    Screenshot 2019-06-26 at 15.05.04.png

     

    3 - Create and download private key (json file) for this service account, follow this guide: Creating and managing service account keys  |  Cloud Identity and Access Management Documentation  |  Google Cloud . Tableau Server will use this key to login on Google Cloud and perform queries, via the Simba JDBC driver.

     

    4 - Download and install the Simba JDBC 4.2 compliant drivers here: Magnitude Simba drivers for BigQuery  |  BigQuery  |  Google Cloud

     

    5 - Extract all files to the following directories, depending if you have Tableau Server running on Windows or Linux (Mac option also listed below, for Tableau Desktop):

     

    Windows: C:\Program Files\Tableau\Drivers

    Mac: ~/Library/Tableau/Drivers

    Linux: /opt/tableau/tableau_driver/jdbc

     

    6 - Make sure you have JAVA 64bit installed on your machine, as these drivers are nothing more than a Java program. You could get them from here: OpenJDK

     

    7 - Create a JDBC properties file that will contain all the necessary parameters for the Driver to login on Big Query with the json private key, set parameters to enable the HighThroughput mode, etc.

    File name: genericjdbc.properties

     

    Example file contents for Windows (note the OAuthPvtKeyPath syntax for linux will be slightly different):

     

    ProjectId=<your_project_id_here>
    OAuthType=0
    OAuthServiceAcctEmail= <yourServiceAccountEmailHere>
    OAuthPvtKeyPath=C:\\dev\\GBQTableauExtractAccelerator\\<your_pvt_key_file_here>.json
    QueryDialect=SQL
    EnableHighThroughputAPI=1
    Timeout=600
    

     

    Feel free to tweak these settings to your taste. You will find the full documentation for Simba's JDBC driver on the zip file. A particularly useful option is to enable logs with the following key-value pairs:

    LogLevel=4
    LogPath=<your-output-log-directory-here>
    

     

    8 - Save this file as genericjdbc.properties in these directories (multiple places if you have Desktop and Server on the same machine):

     

    For use by Tableau Desktop:

    My Tableau Repository\Datasources

     

    For use by Tableau Server:

    Windows: %ProgramData%\Tableau\Tableau Server\data\tabsvc\vizqlserver\Datasources

    Linux: /var/opt/tableau/tableau_server/data/tabsvc/vizqlserver/Datasources/

     

    9 - Testing the Driver. Open Tableau Desktop and choose Generic JDBC connector:

    Screenshot 2019-06-26 at 14.42.35.png

     

    Use the following settings with the JDBC Connector:

    URL: jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;

    Dialect: SQL92

    Leave everything else blank. You don't even need to import the properties file here, if you have followed step 8 and saved it under My Documents\Datasources, the file will be picked from there automatically.

     

    10 - Connect to you BigQuery table in live mode first. We'll not generate the full extract on Desktop, but rather on Server (see later steps in this guide):

    Screenshot 2019-06-26 at 14.48.27.png

     

    11 - Generate an empty extract now in Desktop and publish it to Tableau Server. Follow this guide for a step by step: Easy empty local extracts | Tableau Software

     

    12 - If you have saved the genericjdbc.properties file correctly under the appropriate directory (step 7 in this guide), Tableau Server is now ready to generate an extract using BigQuery Storage API