High Throughput Google BigQuery extracts with Simba JDBC driver

Version 12

    This document describes the steps to configure Tableau to generate Hyper Extracts between 2x to 5x faster the speed of the Tableau's native Google BigQuery connector, by leveraging SIMBA's JDBC for BigQuery.  The reason for better performance is due to Simba Driver's support for the Google High Throughput API (aka Google BigQuery Storage API).

     

    In addition, this method allows Tableau customers create extracts with GCP Service Accounts.

     

    BEWARE:

    • Due to incompatibility with SQL dialects, this JDBC connection will not work for regular ad-hoc queries with Tableau, users will experience a number of CAST errors. But it's a perfectly usable solution for taking simple Extracts (e.g. Select * from <MyBigqueryTable>). I.e. This solution is only suitable for generating Extracts!
    • This solution involves forcing Tableau Server's own JDBC "Server" to run on a different java version (v11) than what's shipped with Server (v8). So far I have not seen any side effects of doing so, but obviously this is absolutely not-supported by Tableau, only proceed at your own risk!

     

    DISCLAIMER (SOLUTION SUPPORT): "COMMUNITY SUPPORTED", i.e. Tableau's Support team will NOT be able to help you, so please don't open a ticket.

     

    PRE-REQUISITES:

    • Tableau Desktop 2019.2 or newer.
    • Tableau Server 2019.2 or 2019.3 (**ONLY** these versions have proved to work on my tests)
      • Note: This solution doesn't seem to work with Tableau Server v2019.4 nor 2020.1, due to a change on our JDBC connectivity framework which causes an issue with Simba's JDBC driver to conflict with the JDBC Server's framework. This might be fixed in Tableau Server 2020.2 and up.
    • Java v11

     

    Let's start! First things first. Setting up the accounts and environment:

     

    1 - Enable Google Storage API: Enabling and Disabling APIs  |  Cloud APIs  |  Google Cloud

    At the time of writing the article, the cost for using this Storage API is $1.10 per TB read

     

    2 - Create a GCP Service Account here: https://console.cloud.google.com/iam-admin/serviceaccounts  . We will generate Tableau Hyper Extracts with this account. And make sure to set appropriate permissions for this account here: https://console.cloud.google.com/iam-admin/iam  , so it can read data from Google Big query. In my case, I have the following permissions enabled:

    Screenshot 2019-06-26 at 15.05.04.png

     

    3 - Create and download private key (json file) for this service account, follow this guide: Creating and managing service account keys  |  Cloud Identity and Access Management Documentation  |  Google Cloud . Tableau Server will use this key to login on Google Cloud and perform queries, via the Simba JDBC driver.

     

    Configure your Tableau Server/Desktop environments:

     

    4 - Download and install the Simba JDBC v4.2 compliant drivers here: Magnitude Simba drivers for BigQuery  |  BigQuery  |  Google Cloud:

     

    Extract all files to the following directories, depending if you have Tableau Server running on Windows or Linux (Mac option also listed below, for Tableau Desktop):

     

    Windows: C:\Program Files\Tableau\Drivers

    Mac: ~/Library/Tableau/Drivers

    Linux: /opt/tableau/tableau_driver/jdbc

     

    • Make sure the Tableau Server RunAs user has permissions to access and run these files, this is crucial.

     

    5 - Make sure you have JAVA11 64bit installed on your machine, as these drivers are nothing more than a Java program.

    • Take a note of the location where java drivers were installed. In this article, I'll assume java11 has been deployed at: "C:\Program Files\Java\jdk-11.0.6"
    • Make sure the Tableau Server RunAs user has permissions to access and run these files, this is crucial.

     

    6- For configuring Tableau Server, we need to point Server's JDBC "server" to use this new java11, rather than java8 that ships with Server. We can do so by running the following TSM Commands.

    • tsm configuration set -k native_api.java_home "C:\Program Files\Java\jdk-11.0.6"
    • tsm pending-changes apply -iw --ignore-prompt

     

    7 - Create a JDBC properties file that will contain all the necessary parameters for the Driver to login on Big Query with the json private key, set parameters to enable the HighThroughput mode, etc.

    File name: genericjdbc.properties

     

    Example file contents for Windows (note the OAuthPvtKeyPath syntax for linux will be slightly different):

     

    ProjectId=<your_project_id_here>
    OAuthType=0
    OAuthServiceAcctEmail= <yourServiceAccountEmailHere>
    OAuthPvtKeyPath=C:\\dev\\GBQTableauExtractAccelerator\\<your_pvt_key_file_here>.json
    QueryDialect=SQL
    EnableHighThroughputAPI=1
    Timeout=600
    

     

    Feel free to tweak these settings to your taste. You will find the full documentation for Simba's JDBC driver on the zip file. A particularly useful option is to enable logs with the following key-value pairs:

    LogLevel=4
    LogPath=<your-output-log-directory-here>
    

     

    8 - Save this file as genericjdbc.properties in these directories (multiple places if you have Desktop and Server on the same machine):

     

    For use by Tableau Desktop:

    My Tableau Repository\Datasources

     

    For use by Tableau Server:

    Windows: %ProgramData%\Tableau\Tableau Server\data\tabsvc\vizqlserver\Datasources

    Linux: /var/opt/tableau/tableau_server/data/tabsvc/vizqlserver/Datasources/

     

    • Make sure the Tableau Server RunAs user has permissions to access and run these files, this is crucial.

     

    Testing this solution

     

    9 - Testing the Driver. Open Tableau Desktop and choose Generic JDBC connector:

    Screenshot 2019-06-26 at 14.42.35.png

     

    Use the following settings with the JDBC Connector:

    URL: jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;

    Dialect: SQL92

    Leave everything else blank. You don't even need to import the properties file here, if you have followed step 8 and saved it under My Documents\Datasources, the file will be picked from there automatically.

     

    10 - Connect to you BigQuery table in live mode first. We'll not generate the full extract on Desktop, but rather on Server (see later steps in this guide):

    Screenshot 2019-06-26 at 14.48.27.png

     

    11 - Generate an empty extract now in Desktop and publish it to Tableau Server. Follow this guide for a step by step: Easy empty local extracts | Tableau Software

     

    SUCCESS!

    12 - If you have saved the genericjdbc.properties file correctly under the appropriate directory (step 7 in this guide), Tableau Server is now ready to generate an extract using BigQuery Storage API

     

     

     

    ROLLBACK:

    Should you need to rollback the changes applied to your Tableau Server (i.e. you wish to reset Tableau Server's JDBC "Server" to use java8 that ships with the product), simply run these two commands:

    • tsm configuration set -k native_api.java_home -d
    • tsm pending-changes apply -iw --ignore-prompt

     

     

    DISCLAIMER

    THIS SOLUTION IS PROVIDED "COMMUNITY SUPPORTED", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.