From late November 2018 through early 2019 there was an interesting question being tackled in the Tableau forums: Help on mapping population census blocks using dynamic radius filter

 

This question turned into one of the longest (and most fun) forums threads that I’ve ever been involved in. At some point in the forums discussion between Wilfred Eze (who asked the original question), Richard Leeke (a Tableau forums ‘Data Monarch’ and all around master at making Tableau do all sorts of awesome things), and myself, I said that I would summarize the process into something a little easier to digest than our 100+ back-and-forth posts on everything from data manipulation to spatial indexes to query optimization and finally making some actual vizes.    So, here it is.

 

Disclaimer & giving credit where it is rightfully due: Most of the heavy lifting that I did on this forums question was on data prep / import to SQL Server, some lightweight SQL demos, and writing this up so we would have a clean archive –  I didn’t mastermind much of the optimized SQL that drives the analytics behind this. The mastermind behind the SQL was Richard Leeke, who is infinitely more skilled with SQL Server than I am. I’ve done some re-writing and cleaning in tailoring for this blog post, but the ideas behind how it works are pretty much all Richard.

 

The post on the Community Forums ended up having 100+ posts to get to a solution.  It is pretty long and a bit of a slog to go through.  To keep things a little simpler, and more organized, than the forums post, I’ve broken this post into two parts – feel free to skip ahead to whatever is right for your needs.

 

  1. Intro to the problem, tools, data, and basic setup to move the data into SQL Server (this post)
  2. SQL Server and Tableau playing nice for some analytic awesome (Part II)

 

Intro to the problem, tools, data, and basic setup to move the data into SQL Server

 

The spatial analysis challenges:

Here are the challenges that were presented in the Community Forums question (Help on mapping population census blocks using dynamic radius filter).  I’m using this as an example for the blog post, but the methods are relevant to many different spatial analysis questions, so you should be able to adapt as needed for your project.

 

This set of posts addresses the following challenges:

 

1. Create dynamic, parameter-driven buffers around point locations (city centres).

For every city centre point location, we need to be able to draw a circular buffer of any size and visualize that in Tableau.  We then want to be able to use these circles to identify overlapping locations in other spatial datasets…

cityCentreBuffer.gif

 

2. Use the city center buffers to identify:

    • Population living within that radius, based on Census geographies
      • What Census subdivisions or dissemination blocks are inside the buffer? Update the polygons selected when the buffer is adjusted.
      • Show entire polygon, or split the polygons to only show the portion within the buffer

 

buffer_pop.gif

 

  • Characteristics of road networks within the buffer
    • What roads are inside the buffer?  Update the roads when the buffer is adjusted.
    • Intersect road network lines with buffers, analyze length of segment within buffer or other characteristics based on segment length

 

buffer_highway.gif

 

3. Visualize just the data in the buffer – or all data, but indicate which buffer it falls inside

Show it all in Tableau and make it useful for analytics!

 

For instance – here is a 20km buffer drawn around Saskatoon, Saskatchewan, with all roads clipped to the extent of the buffer, and detail from only the Census Subdivisions that intersect the buffer:

 

Of course, as happens with most every data analysis project – just getting to the analysis part required a good bit of data prep.  In this series of posts, as I discuss the process of solving the spatial analysis challenges, I’ll also provide some detail on what we needed to do to set everything up (from pre-processing the spatial data to loading it up into SQL Server to actual querying…)

 

Tools to do some Tableau spatial analytic magic:

All in all, the methods that I discuss in these blog posts involve:

 

  • Tableau 2018.1+ - In Tableau 2018.1 we introduced native support for GEOGRAPHY data types from SQL Server.  It’s awesome and opens the door for so many cool spatial analyses!

 

  • QGIS – free, open source GIS package (for some data pre-processing).

 

  • OSGeo4W – for importing data to SQL Server.  I’ve heard that some people like using a GUI-based tool called Shape2SQL, but I find that it’s so quick and easy to just use the command line that I go that route.  I also tried once to make Shape2SQL work, but I failed and never tried again.  Your mileage may vary.  My instructions will only cover OSGeo4w.

 

 

The data:

To make it easier to play along, I’ve attached all of the files that we were working with so that you can download and play along.  I’ve included the original version of the files and the processed and ready for SQL Server version.  There should be links to download the attachments at the end of the blog post.

 

  • Census Subdivisions (950 polygons)

  • Census dissemination blocks (54,118 polygons)

  • Highways (24,769 line segments)

  • City population centres (61 polygons)

 

 

Data as shapefiles -> tables in SQL Server

 

The first challenge with this project was to simply move the data into SQL Server so that we could write some spatial SQL queries to drive the analyses.  This felt like it was going to be easy, but it turns out that with SQL Server there were a few tricks when working with these particular datasets.

 

To be honest, I ran into more problems with importing these particular datasets into SQL Server than I have with any other spatial datasets, so it may be much easier for you when you’re working with your own data.  But…I guess for a blog post on how to make things work sometimes it’s nice to have some tricky data to use in explaining how to bend everything to your will

 

Note: I have found the process of importing these particular datasets to PostgreSQL to be a bit easier, because PostgreSQL is more flexible with the spatial reference information (SRID) and the data types…so there aren’t quite so many hoops to jump through.  And…as of late December 2018, Tableau was in the Alpha for PostgreSQL spatial (PostGIS) support!  Yay!

Perhaps when we’re in Beta or final release for PostGIS support I’ll re-create and write up the how-to with that database. It really will be a new challenge to tackle, because there are some interesting differences in the spatial functionality between the two databases.

 

Create a database in SQL Server

When I installed SQL Server Express, I also installed the SQL Server Management Studio (SSMS). This makes it easy to interact with your database using a nice GUI.

 

 

After you have opened SSMS and connected to your server (I’m running my databases using localhost, but you may be using a remote host – connect to wherever your server lives), you’ll want to right click on the Databases ‘folder’ and create a New Database…

 

 

Name your database and go ahead and use the default settings.

 

You should now have a database set up where you can store and access the data for your analysis in Tableau

 

 

Spatial reference challenges – re-project shapefiles to get ready for import to SQL Server

Now that you have your database set up, we need to import the shapefiles.  But…there is a quick step to do first.  If your data isn’t already in a latitude and longitude-based coordinate system (e.g., NAD83, WGS84, etc.), you will need to re-project so that we can easily make and populate a GEOGRAPHY column in SQL Server.

 

Tableau likes data from SQL Server to be of a GEOGRAPHY data type (latitude and longitude so that it is analyzed as if it’s on the surface of the earth, as opposed to data being in a GEOMETRY data type where the data is analyzed as if it’s projected onto a flat surface).  The original shapefiles for this project were in UTM Zone 13 North.  This is a great coordinate system if you are working with data in Saskatchewan, Canada, but it isn’t great if you want to use it in SQL Server and then drop it into Tableau.

 

In case you’d like a tutorial or refresher on spatial data types in SQL Server, here you go: Overview of spatial data types in SQL Server

 

First thing - we needed to re-project the data so that it would play nicely with SQL Server and Tableau's desire for GEOGRAPHY.

 

To learn more about map projections and coordinate systems, check out Working with projected data in Tableau – Part I: Map projection basics

 

There isn't an easy method for transforming coordinate systems in SQL Server, so I opted to do this data prep stage in QGIS.  QGIS is a free, open source geographic information system (GIS) software.  Re-projecting the data is easy - just open the shapefile and 'Save As...' and select the new coordinate system that you want to use for the shapefile.

 

I changed all of the shapefile data sources for this project into the World Geodetic Survey of 1984 (WGS84) coordinate system (SRID:4326).

 

Get all of the shapefiles into SQL Server

 

Normally this is a super-easy process.  It’s just a quick command line import process (which should work for most datasets).    I use the OSGeo4W shell for importing.  There are apparently other options, including a GUI for importing shapefiles to SQL Server (e.g., SHP2SQL), but I have found this more difficult to set up and use.

 

Using the OSGeo4w shell, I normally just use a simple set of commands for OGR2OGR.  OGR2OGR allows for transformations between all sorts of spatial data formats, including from Shapefile to SQL Server.

 

There are about a billion options for the commands

 

 

I have generally had success with a simple import command for Shapefiles -> SQL Server using the OSGeo4W shell, like this:

 

ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLEXPRESS;Database=SqlServerTableauDemo;Trusted_Connection=yes" "c:\temp\

myShapefile.shp" -a_srs "EPSG:4326"

 

What does that bunch of words mean?? Let’s break it down a bit:

 

ogr2ogr

This is the tool you’re going to use – it converts simple spatial features between file formats.  We’ll use it to convert between Shapefile and SQL Server

-f "MSSQLSpatial"

The file format that we are converting to – MS SQL Server format.  The -f is the flag that indicates the next bit of text is the file format

"MSSQL:server=blah blah blah”

Details on the SQL Server instance where we are putting the data

  • server=localhost\SQLEXPRESS --- this says ‘look for a SQL Server Express instance on localhost’  Put in whatever info you need to locate the server with your SQL Server instance (it might be localhost, it might be housed remotely somewhere…)
  • Database=SqlServerTableauDemo --- this says ‘look for this particular database called SqlServerTableauDemo’ You need to have already created the database before you try to import the data
  • Trusted_Connection=yes --- I have my database set up to use my windows login since I’m running on a localhost.  If you need to specify a user name and password, you would swap this statement out for something like
    User Id=<insert your user id>;Password=<insert your password>

"c:\temp\myShapefile.shp"

The location and name of the shapefile that you want to import

-a_srs "EPSG:4326"

The coordinate system for the shapefile that you want to import (using the EPSG code to specify).   How do you find this magical code?  You can look up the coordinate system on the EPSG web site, or if you’ve manipulated the coordinate system in QGIS, the EPSG code will often be listed with the coordinate system name.  The two that I use most frequently for US-based or world datasets:

  • 4326 – World Geodetic Survey of 1984 (WGS84)
  • 4269 – North American Datum of 1983 (NAD83)

 

When working with shapefiles, I often just open the .prj file for the shapefile in a text editor, look at the coordinate system name, and then search for that on the EPSG web site.

 

That is what I normally use successfully with SQL Server. But, of course, these particular files threw a few errors that we had to sort out ☹

 

ERROR 1: INSERT command for new feature failed. [Microsoft][SQL Server Native Client 11.0][SQL Server]Arithmetic overflow error converting numeric to data type numeric.

ERROR 1: Unable to write feature 1 from layer CensusSubdivision_4326.

ERROR 1: Terminating translation prematurely after failed

translation of layer CensusSubdivision_4326 (use -skipfailures to skip errors)

 

As the error noted, I could just use -skipfailures to skip all of the errors, but that ended up skipping almost all of the records in the dataset!  So, we need another solution.  The key to the solution is the first error in that there was an “arithmetic overflow error converting numeric to data type numeric.”  That tells us that something is going on with the attributes being put into a numeric column.

 

The fix that we found for this problem was to just unset the field width so that it could handle the attributes correctly.

 

So if you want to try to follow along at home, you’ll use import commands with an extra flag on them to “unset the field width” – just add the “-unsetFieldWidth” flag at the end of the command for each file (see examples below).  In the course of getting the data into SQL Server, we discovered that these particular datasets had some values that fell outside of the range for the default data type set by SQL Server.  This just gets around that problem and ensures that all of the attributes in each field are imported correctly.

 

There may still be a handful of ring self-intersection errors that come up on import, but after looking at those individual intersections in QGIS, it seemed like they weren’t going to be much of an issue for these analyses, so I just ignored them.  That might not be the wisest solution in all cases, but it was the path that I took for this demonstration…

 

In the OSGeo4W shell (NOT from a regular Windows or Mac command prompt), use these commands as a template.  Remember to update the information on the location / details for your SQL Server instance, and the location of your shapefiles.

 

  • ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLExpress;Database=<your database name>;Trusted_Connection=yes" "c:\temp\populationcentre_4326.shp" -a_srs "EPSG:4326" -unsetFieldWidth

 

  • ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLExpress;Database=<your database name>;Trusted_Connection=yes" "c:\temp\highway_official_current_4326.shp" -a_srs "EPSG:4326" -unsetFieldWidth

 

  • ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLExpress;Database=<your database name>;Trusted_Connection=yes" "c:\temp\censussubdivision_4326.shp" -a_srs "EPSG:4326" -unsetFieldWidth

 

  • ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLExpress;Database=<your database name>;Trusted_Connection=yes" "c:\temp\disseminationBlock_4326.shp" -a_srs "EPSG:4326" -unsetFieldWidth

 

 

After you’ve imported all of the shapefiles, you should see them listed as Tables in your database. By default they will all have the same name as the shapefile that you used as input:

 

Create the GEOGRAPHY in SQL Server

 

Now that the data is in SQL Server, we’re only have a few quick bits of processing left.  We’re going to create a GEOGRAPHY column and a spatial index.  This will 1) convert the data into a format that Tableau likes, and 2) speed up the spatial analysis.

 

By default, only GEOMETRY is created when importing the shapefiles into SQL Server, so we’ll need to create a GEOGRAPHY column so that Tableau can read the data.  In the process of doing this, we’re going to add in a little trick where we reverse the order of the vertices in all of our polygons.  We need to do this because Shapefiles and SQL Server have different semantics on how polygons should be rendered – and if you don’t swap the ordering, your polygons may render “inside out” in Tableau or any of the analyses with the polygons may be inverted.  Just trust me, you don’t want that.

To create the GEOGRAPHY and swap the vertex ordering at the same time, you can do this in SQL Server (broken down below for each table that was imported – but you could just string them all together into one long query if you want):

 

Let’s break down the basics of the query and look at what this SQL is going to do (Some copy/paste-able queries will be right after this…don’t worry):

 

alter table dbo.populationCentre_4326

add ogr_geography geography;

Take one of the tables of data (the imported shapefile) and add a new column called “ogr_geography” that is a GEOGRAPHY data type

update dbo.populationCentre_4326

set ogr_geography =

ogr_geometry

.MakeValid()

.STUnion(ogr_geometry.MakeValid()

.STStartPoint())

.MakeValid()

.STAsText()

Now update the table and populate the ogr_geography column with the polygon that was imported from the shapefile.  The long string of extra stuff after “ogr_geometry” (the original polygon geometry imported from the shapefile) is a quick way of reversing the vertex order for the polygon from the shapefile and then converting it to a GEOGRAPHY data type. 

More info on that process? Working with invalid data and the SQL Server 2008 geography data type

 

And here is the text that you can copy/paste for each table of data – just open up a new query and run each of these…

 

  • Population Centre

 

alter table dbo.populationCentre_4326

add ogr_geography geography;

GO

update dbo.populationCentre_4326

set ogr_geography=ogr_geometry.MakeValid().STUnion(ogr_geometry.MakeValid().STStartPoint()).MakeValid().STAsText()

 

While we’re at it with the population centres, we should go ahead and add a field with just the centroids (a point located roughly in the middle of the population centre polygon) – since that is what we are going to use for our final calculations (what is within a set distance of the centroid / approximate middle of the population centre polygon):

 

alter table dbo.populationcentre_4326

add centroid geometry;

GO

update dbo.populationcentre_4326

set centroid = ogr_geometry.STCentroid();

GO

alter table dbo.populationcentre_4326

add centroid_geog geography;

GO

update dbo.populationcentre_4326

Set centroid_geog = geography::STGeomFromText(centroid.STAsText(), 4326)

 

For reference, when you run this query  you should see something like this:

 

 

  • Highway official current

 

alter table dbo.highway_official_current_4326

add ogr_geography geography;

GO

alter table dbo.highway_official_current_4326

add wkt varchar(max);

GO

update dbo.highway_official_current_4326

set wkt = ogr_geometry.STAsText();

GO

update dbo.highway_official_current_4326

set ogr_geography = geography::STGeomFromText(wkt, 4326).MakeValid();

 

  • Census subdivision

alter table dbo.censusSubdivision_4326

add ogr_geography geography;

GO

update dbo.censusSubdivision_4326

set ogr_geography =ogr_geometry.MakeValid().STUnion(ogr_geometry.MakeValid().STStartPoint()).MakeValid().STAsText()

 

  • Dissemination block

alter table dbo.disseminationBlock_4326

add ogr_geography geography;

GO

update dbo.disseminationBlock_4326

set ogr_geography =ogr_geometry.MakeValid().STUnion(ogr_geometry.MakeValid().STStartPoint()).MakeValid().STAsText()

 

Set up some spatial indexes (speed up the queries!)

 

While the datasets we are working with aren’t gigantic, the performance of our analyses will still benefit from setting up spatial indexes on the GEOGRAPHY for each of our tables. This will optimize the spatial queries.

 

A bit of reading on spatial indexes, if you’re curious: About spatial indexes

 

The easiest way to do this is just to expand the table in the object explorer in SQL Server Management Studio and create a new spatial index

 

 

Accept the default names in the dialog box and select Add… to add a column to the index.

 

 

You’ll use the ogr_geography (or the centroid_geog for the city centres, if that is the table you are currently working with) and click OK

 

 

Then you’ll go into the Spatial index options and set the Tessellation Scheme to ‘Geography Grid’ (NOT the Geometry Grid). After that, just click okay and the spatial index on GEOGRAPHY should be built.

 

 

Now that everything is all set up, we can take a minute to make sure that it’s all happy and visible in Tableau.

 

Open up Tableau 2018.1 or later.  Connect to your SQL Server database,

 

 

Select your database, pick a data table that looks interesting to you ,and drop the table in as the data source.  I’d recommend using one other than the Dissemination Blocks (it’s sort of big and takes the longest to render).

 

 

 

Then go to a new worksheet, drop the ogr_geography dimension onto a viz and you should see your datasets rendered.

 

 

Hopefully all of that is working for you and we can move on to the next post in this series where we do some analytics!