Visualizing your Data In Tableau using DSE 6 Docker Images and the Simba Spark ODBC driver

Version 1

    In this simple tutorial we will walk through connecting Tableau to a DataStax Enterprise (DSE) 6.0 Docker Container utilizing DSE Alwayson SQL via the Spark ODBC driver and creating a simple Tableau workbook. We will Utilize DSE analytics to perform the computations, joins and aggregates on the server side (DSE) which is faster and more efficient, while harnessing Tableau to visualize the data.

    This Tutorial was created using a DSE 6.0.0 Docker Container, Tableau Desktop 10.5 running on Windows and Spark ODBC driver 1.2.6.1007.

    Prerequisites

    • Basic understanding of Docker images and containers.
    • Docker installed on your local system, see Docker Installation Instructions.
    • A working DSE Docker container running an Analytics workload with the DataStax config volume mounted and the following AlwaysOnSQL/Spark UI ports bound.
      • 10000, 7080, 7077, 4040, 9077
    docker run -e DS_LICENSE=accept --name my-dse -p 10000:10000 -p 7080:7080 -p 7077:7077 -p 4040:4040 -p 9077:9077 -v ~/config:/config -d datastax/dse-server -k 
    • DataStax Docker Images are hosted on Docker Hub. For documentation including configuration options, environment variables, and compose examples head over to DataStax Academy
    • A DataStax Academy Account to Download and Install Simba ODBC Driver for Apache Spark: If you do not already have one, visit https://academy.datastax.com/
    • A working Tableau Desktop install and basic Tableau understanding.

    The Simba ODBC Driver for Apache Spark

    • Download the Simba ODBC Driver for Apache Spark for your version of Microsoft Windows (32bit or 64bit) running Tableau Desktop by visiting https://academy.datastax.com/download-drivers
    • Install the Simba ODBC Driver for Apache Spark.

    Enabling Alwayson SQL

    For advanced configuration management, we’re providing a simple mechanism to let you change or modify configurations without replacing or customizing the containers. You can add any of the approved config files to a mounted host volume and we’ll handle the hard work of mapping them within the container. You actually enabled this feature with the flag -v ~/config:/config when you started your DSE container and we will use it to enable alwayson sql.You can read more about that feature here.

    • Alwayson sql is enabled via the dse.yaml. I have created a custom dse.yaml with the setting preconfigured to use for this demo.
      • Download the dse.yaml found here to your docker host machine and place it in the ~/config directory created when starting the DSE container
        • If running linux or mac you can run the following command
    wget -L https://raw.githubusercontent.com/roberd13/tableau-dse6-demo/master/DemoData/dse.yaml -O ~/config/dse.yaml 
    • Restart your dse container so it will pick up the custom dse.yaml and start alwayson sql
    docker restart my-dse 

    Demo Data

    You will need to get the demo data on the DSE Docker Container from here using the following commands

     

    Create directory for the demo data

    docker exec -it my-dse mkdir /opt/dse/demodata 

    Download the script to create the keyspace, table and load the data

    docker exec -it my-dse wget -L https://raw.githubusercontent.com/roberd13/tableau-dse6-demo/master/DemoData/cqlscript.sh -O /opt/dse/demodata/cqlscript.sh 

    Change the permissions to allow execution of the script

    docker exec -it my-dse chmod +x /opt/dse/demodata/cqlscript.sh 

    Download the demo data

    docker exec -it my-dse wget -L https://raw.githubusercontent.com/roberd13/tableau-dse6-demo/master/DemoData/videos.csv -O /opt/dse/demodata/videos.csv docker exec -it my-dse wget -L https://raw.githubusercontent.com/roberd13/tableau-dse6-demo/master/DemoData/videos_by_actor.csv -O /opt/dse/demodata/videos_by_actor.csv 
    • Run the cqlscript.sh script to create the keyspace, tables and load the data using the new dsebulk tool. A Keyspace named killr_video with 2 tables videos and videos_by_actor will be created.
      docker exec -it  my-dse bash "/opt/dse/demodata/cqlscript.sh" 

    Now lets Create a Connection to DSE in Tableau

    • Lets make sure that AlwaysOnSQL is running docker exec -it my-dse dse client-tool alwayson-sql status
    • Open Tableau and create a connection to DSE by selecting Spark SQL under To a Server. You may need to click more to show this option
    • Setup your connection by
      • Add the ipaddress of the Docker host port 10000 is bound from the DSE container to hosts
      • Set Port to 10000
      • Set Type to SparkThriftServer
      • Authentication to username
      • Click Sign in

    • When your workbook opens after Signing in. On the Data Source tab under connections.
      • Search Schema using the drop down and Select Killr_video
      • Search Table using the drop down and your tables will be listed
      • Drag videos and videos_by_actor to the tables area

    • Be sure to keep a Live connection, if you use Extract, all of the data will be loaded into Tableau.
    • Create an inner join clause for Release Year = Release Year (Videos By Actor. (click on the blue part of the link between the 2 tables to be able to edit this inner join)

    • Click Update Now to see a sample of your data

    Now Lets See the Count of Movies grouped by Movie Ratings Per Year

    • Click on Sheet 1 Tab at the bottom
      • Convert Release Year (Videos By Actor) to Discrete
      • Drag Mpaa Rating to Color in the Marks section
      • Drag Release Year (Videos By Actor) to Columns
      • Drag Mpaa Rating to Rows
      • Change the Mpaa Rating under rows to a Count Measure type by using the drop down
      • Select Stacked Bar on the Show Me snap on
      • Click Show Me to hide the snap on

    You now have a color coded graph with the Count of Movies grouped by Movie Ratings Per Year

    If you are curious what Tableau did under the hood, you can visit http://dockerhostipaddress:7080 which will take you to the Spark Master UI, you can then click AlwaysOnSQL which will direct you to the AlwaysOnSQL Application UI and you can see all of the queries performed during this tutorial.

    Master UI

    Application UI