Skip navigation
2019

In the first part of this post series, we took some raw spatial data (as shapefiles), did a bit of prep and loaded it up in SQL Server so that we could access the spatial data - and use the built in spatial functionality of SQL Server - in Tableau.

 

Most of this post relies on dynamic queries run directly through SQL Server, however, to demonstrate the build up of the Tableau workbook I have attached a packaged workbook running from an extract that you can use as a reference for constructing the worksheets and dashboards. Note that this workbook will NOT be dynamically adjustable (i.e., you cannot change the parameter and have the buffer-driven analytics update...because it isn't connected to SQL Server, it's just running from a Tableau extract).  But...it should provide assistance if you are having trouble with the dual axis maps or setting up filters.

 

Now that all of our datasets are in Tableau, we can explore options for using the spatial analysis functions in SQL Server to drive some interesting analyses in Tableau.

 

There are two ways that you can integrate the SQL Server spatial functionality in Tableau to augment your analytics: Custom SQL and Raw SQL.

 

Custom SQL: Allows the user to use table-level operators that Tableau doesn’t generate.  This means that you can create entirely new data source tables created directly from queries in SQL Server.

 

RawSQL: Allows you to use column-level functions that Tableau doesn’t generate (e.g., a calculated field that uses a RawSQL query to calculate the area for a polygon).

 

These two options make it so that if you can write the query in SQL Server, you can use it in Tableau and then use the results in your visualizations.

 

Note: if you want to dig into performance issues with Custom SQL, take a look at Alan Eldridge’s “Designing Efficient Workbooks” whitepaper.  It’s a few years old at this point, but still has a lot of relevant detail on Custom SQL

 

For the most part for the spatial analytic problems addressed in this blog post we’ll want to use Custom SQL to alter the structure of the Tableau data source table, but I’ll show you an example of RawSQL as well.


Let’s start with simple queries and work our way up…

 

1. Dynamic buffers

The core of the problem that we’re tackling is that we have point locations for city centres (the centroids that we calculated in Part I) and we want to be able to identify what is within a set (but adjustable) distance from the points.  To do that, we’ll make buffers around each city centre and use a parameter in Tableau to dynamically resize them.  For this simple starting point, we’ll look at how we can create the buffers with both Custom SQL and RawSQL.

 

RawSQL method - for column-level alteration of our data source

 

Connect to the SQL Server database that you made in the first part of the blog, and add the populationcentre_4326 table:

 

 

On a new worksheet, create a new calculated field:

 

 

And enter this calculation (this is the non-dynamic version that we’ll start with – we’ll add a parameter to control it next):

 

 

RAWSQL_Spatial("%1.STBuffer(1000)", [Centroid Geog])

 

If you add this newly created calculated field to the viz, you should see 1 km (or 1000 meter) buffer polygons.  The units for buffers created on GEOGRAPHY data types are in meters. You can manually change the size of the buffer by editing the calculated field and replacing “1000” with whatever distance you want.

 

 

To make the buffer dynamic, we just need to create a Parameter, and then drop that into the RawSQL calculation instead of the hard coded distance:

 

RAWSQL_Spatial("%1.STBuffer(%2 * 1000)", [Centroid Geog], [Buffer distance (KM)])

 

Alternatively, we could have written that calculation with the buffer distance being converted from kilometers to meters outside of the query passed to SQL Server, which may speed things up a tiny big... This other way of writing the query looks like this (we just shift the * 1000 to the end:

 

RAWSQL_Spatial("%1.STBuffer(%2)", [Centroid Geog], [Buffer distance (KM)] * 1000)

 

If you add your parameter to the worksheet, you can now adjust the buffer size dynamically. Every time you change the parameter value a new polygon will be calculated and rendered.

 

cityCentreBuffer.gif

 

Custom SQL to augment the data source directly

Our other option for running analytics through SQL Server is to just alter the data source table on creation (instead of just calculating a single new column as a calculated field).  This is just like writing a query in SQL Server to generate a new data table. Only the data that you request in the query will end up in the data source table.

 

Instead of adding a specific table directly from our database, we’ll add ‘New Custom SQL’ instead:

In the Custom SQL window that opens up, we’ll add the following query to generate a super-simple table that only contains a buffer (named “dynamic_buffer”):

 

select centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as dynamic_buffer

from populationcentre_4326

 

Notice that the query above uses the parameter we created earlier (Buffer distance (KM)).  The easy way to add that into a query is to use the Insert Parameter drop down in the Edit Custom SQL window:

 

If you open a new worksheet, you should see this one new field.  You can add it to a viz and see the buffers.  By adding the parameter to the worksheet as well, you can adjust them dynamically – just like we did with the RawSQL generated buffer.

 

 

Notice that there is only one measure in this data source – that is because we only included one field in our query!  We could have easily included any of the fields in the populationcentre_4326 table in our query.  We’ll see lots of examples of this in the following steps where we want more information about our data as a result of our Custom SQL queries.

 

2. Buffers + polygon intersections

That last example was a fun example of dynamic spatial data in Tableau, but now let’s take it one step farther and actually use those buffers to select our regions of interest.  This is where the power of SQL Server spatial really lets us go wild… 

At this point I’m going to switch to just using CustomSQL because I want to simplify the process and just end up with a single table to consolidate all of the detail that I want from all of the relevant tables in my database.

 

Let’s grab all of the Census dissemination blocks that intersect our dynamic buffers (just like in the image above).  To get this, it’s just a matter of writing the proper SQL query and dropping it in as custom SQL.  Here is the query that I used –first I’ll give you the copy/paste version and then explain what it’s doing:


select

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

db.dbuid as db_dbuid,

db.cdname as db_cdname,

db.ccsname as db_ccsname,

db.csdname as db_csdname,

db.pop as db_population,

db.ogr_geography as db_geog

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000)) = 1

 

 

With this query we can dynamically select every polygon that overlaps with our buffer – and as we change the buffer size the selected polygons will update accordingly.

 

In case you were curious why the query is set up to only keep the intersecting polygons… we don’t have to write the query that way.  We could easily just return all polygons and a field to let us know if any given polygon intersects a city centre polygon (I don’t recommend this in most cases, but I’ll show you how anyway).

 

To demonstrate how this would work, I’ll use a smaller file with just Census subdivisions – here is the result we would get to show all Census subdivisions in the entire dataset with just the ones within the buffer around the Saskatoon city centre highlighted:

 

If you want to try this, here is the query that I used to create the viz above – its similar to the previous, but instead of doing an inner join, we just reference both tables and compare ALL possible combinations of city centre buffers and Census subdivisions.  We also add in a new field that is returned that tells us true/false on whether any Census subdivision intersects any city center buffer (these parts are in bold below):

 

select pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

subdiv.cduid as subdiv_cduid,

subdiv.cdname as subdiv_cdname,

subdiv.population as subdiv_population,

subdiv.popdensity as subdiv_popDensity,

subdiv.ogr_geography as subdiv_geog,

subdiv.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as within_buffer

from dbo.populationcentre_4326 as pc, dbo.censussubdivision_4326 as subdiv

 

Why would we choose NOT to write our query this way and just return all of the polygons?

 

Unless you really need this level of detail, you want to be careful with queries like this – it’ll bloat up your data quickly.  This leads to a table with ALL combinations of every row in both tables in the query. So, we end up with a TON of records in our final table – this makes it slow to compute (a lot more spatial calculations) and slower to render (a lot more polygons).  This particular query results in 57,950 polygons or (950 Census Subdivisions * 61 population centres).   Hopefully you can see why I didn’t want do this query with the Census dissemination blocks (54,118 dissemination blocks * 61 population centres = ~3.3 million records!)

 

The collected (aggregated) geography from just the Census Subdivisions + population centres is actually large enough and detailed enough to throw an error in Tableau until the polygons are disambiguated…

 

 

3. Buffers + line intersections / clipping

Now that we’ve looked at how to explore the intersection between buffers and Census subdivisions,  we’ll move on to looking at how to segment out the highways so that we have only the highway bits that fall inside the buffers.  This is a line and polygon intersection instead of the polygon and polygon intersection we used above.

 

To do this, we’ll use two different spatial operations – STIntersects and STIntersection.  They sort of sound the same, but they have a useful distinction.  The intersects operator just gives us a 0 or 1 if the spatial features intersect. The intersection operator will give us just the part of the spatial feature that intersects (thus, we can clip the roads to the extent of our polygon buffer).

 

STIntersectsreturns 1 if a geography instance intersects another geography instance; returns 0 if it does not

 

STIntersectionreturns an object that represents the points where a geography instance intersects another geography instance.

 

Here is the query…one part is in bold because it’s worth talking about in a little more detail:

 

select

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

h.highway_id as highway_id,

h.roadname as highway_roadname,

h.placename as highway_placename,

h.rtnumber1 as highway_rtnumber1,

h.ogr_geography as highway_geog,

h.ogr_geography.STLength() as highway_length,

IIF(h.ogr_geography.STIntersection(h.ogr_geography).STGeometryType() = 'GeometryCollection',

h.ogr_geography,

h.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000))) as highway_fragment,

h.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(<

Parameters.Buffer distance (KM)> * 1000)).STLength() as highway_fragment_length     

from dbo.populationcentre_4326 pc

inner join dbo.highway_official_current_4326 h

on h.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000)) = 1

 

Part of the query is shown in bold above because it uses some syntax that is different than the other queries we’ve seen it this post.  There is an IFF (if and only if) statement included that checks to see if the result of the intersection happens to be a Geometry Collection instead of just regular line segments.  This is important because Tableau doesn’t yet support geometry collections, so we want to make sure that we don’t have any of those in our result table.  The IFF statement just checks to see if a geometry collection was returned as the result of the intersection, and if so, replaces that record with the ogr_geography value.  If the result isn’t a geometry collection (i.e., it is a regular linestring), we simply populate the record with the result of the buffer.

 

This is just a safety check to help avoid potential errors from the intersection (and it turns out that there are a few records in the highway dataset that lead to this particular problem…so we’ll just avoid it)

 

4. Combining subdivisions and buffer to show visual extent of clip

In the long run, we’re going to want to show three different datasets: the buffer extent, the clipped roads, and the Census dissemination blocks.   Tableau is good with creating dual axis maps, but this would require three axes or ‘layers,’ which requires some creative manipulation of the data.  We’ll get around the dual axis restriction by splitting our dissemination blocks into two sets of polygons: the portion inside the buffer and the portion outside the buffer.  This will leave us with a nice visual of the buffer AND a way to explore the polygon areas inside and outside the buffer in case we want to get more granular with our analyses.

 

Note that these are just the dissemination blocks that intersect with the buffer and we are simply splitting them into two parts to show the portion of the subdivision inside the buffer and the portion outside the buffer.

Dividing up the data like this is a two part query – one part will select just the portion of the polygons inside the buffer, and one will select the portion outside the buffer. Then we will union them together into one single data table.  To identify the polygons inside and outside the buffer more easily when we do the union, let's go ahead and add in a field that will tell us which part each polygon is on (inside or outside).  We'll make a new 'type' field when we run each of our queries (in bold in the queries below).

 

To highlight the differences between the two queries I’ll walk through them one at a time, and then we’ll union them together to get both parts in the same table.

As you’re looking at the queries, there is really just one key difference: one uses STIntersection and one uses STDifference to select the polygon fragments.

 

STDifferencereturns an object that represents the point set from one geography instance that lies outside another geography instance

 

STIntersectionreturns an object that represents the points where a geography instance intersects another geography instance.

 

Select just the part of the polygons INSIDE the buffer:

select

'Inner Subdivision Segment' as type,

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

db.dbuid as db_dbuid,

db.cdname as db_cdname,

db.ccsname as db_ccsname,

db.csdname as db_csdname,

db.pop as db_population,

db.ogr_geography as db_geog,

db.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000)) as db_fragment

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000)) = 1

 

 

 

Select just the part of the polygons OUTSIDE the buffer:

select

'Outer Subdivision Segment' as type,

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

db.dbuid as db_dbuid,

db.cdname as db_cdname,

db.ccsname as db_ccsname,

db.csdname as db_csdname,

db.pop as db_population,

db.ogr_geography as db_geog,

db.ogr_geography.STDifference(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as db_fragment

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000)) = 1

 

 

 

To put these two tables of polygons together into one single dataset, we just union them by adding ‘UNION ALL’ in between the two queries – like this:

 

select

'Inner Subdivision Segment' as type,

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

db.dbuid as db_dbuid,

db.cdname as db_cdname,

db.ccsname as db_ccsname,

db.csdname as db_csdname,

db.pop as db_population,

db.ogr_geography as db_geog,

db.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as db_fragment

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) = 1

 

UNION ALL

 

select

'Outer Subdivision Segment' as type,

    pc.ogr_fid as pc_ogr_fid,

    pc.pcname as pc_name,

    pc.pcuid as pc_pcuid,

    pc.population as pc_population,

    pc.ogr_geography as pc_geog,

    pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

    db.dbuid as db_dbuid,

    db.cdname as db_cdname,

    db.ccsname as db_ccsname,

    db.csdname as db_csdname,

    db.pop as db_population,

    db.ogr_geography as db_geog,

    db.ogr_geography.STDifference(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as db_fragment

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) = 1

 

5. Putting it all together:

 

Now let’s add the clipped roads in too so that we can see and interact with the buffer, the Census data, and the road network in one viz.  Just like how we unioned the two polygon tables together (the polygon parts inside the buffer and outside the buffer), we’ll do something similar with our clipped roads data, but with one additional bit in the queries….

 

In order to union the tables together we need the same fields in each table, so we’ll augment each individual query in the union.  Because the roads use different field names than the polygons, so we’ll add those fields in as NULL in the polygon queries, and we’ll add the polygon fields in as NULL in the query for the highways. We'll also add in a 'type' field to the highways so that it matches what we added to the dissemination blocks earlier.

 

The whole query looks like this – our three select statements from earlier in the post unioned together, with the minor modifications discussed above to make sure each resulting table has the same columns:  I’ve highlighted the parts that are changes from the queries that we used earlier:

 

select

'Inner Subdivision Segment' as type,

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

db.dbuid as db_dbuid,

db.cdname as db_cdname,

db.ccsname as db_ccsname,

db.csdname as db_csdname,

db.pop as db_population,

db.ogr_geography as db_geog,

db.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as db_fragment,

NULL as highway_id,

NULL as highway_roadname,

NULL as highway_placename,

NULL as highway_rtnumber1,

NULL as highway_geog,

NULL as highway_length,

NULL as highway_fragment,

NULL as highway_fragment_length

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) = 1

 

UNION ALL

select

'Outer Subdivision Segment' as type,

pc.ogr_fid as pc_ogr_fid,
pc.pcname as pc_name,
pc.pcuid as pc_pcuid,
pc.population as pc_population,
pc.ogr_geography as pc_geog,
pc.centroid_geog as pc_centroid_geog,
pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,
db.dbuid as db_dbuid,
db.cdname as db_cdname,
db.ccsname as db_ccsname,
db.csdname as db_csdname,
db.pop as db_population,
db.ogr_geography as db_geog,
db.ogr_geography.STDifference(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as db_fragment,

NULL as highway_id,
NULL as highway_roadname,
NULL as highway_placename,
NULL as highway_rtnumber1,
NULL as highway_geog,
NULL as highway_length,
NULL as highway_fragment,
NULL as highway_fragment_length

from dbo.populationcentre_4326 as pc
inner join dbo.disseminationblock_4326 as db
on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(
<Parameters.Buffer distance (KM)> * 1000)) = 1

 

UNION ALL
select

'Highway segments' as type,

pc.ogr_fid as pc_ogr_fid,
pc.pcname as pc_name,
pc.pcuid as pc_pcuid,
pc.population as pc_population,
pc.ogr_geography as pc_geog,
pc.centroid_geog as pc_centroid_geog,
pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

NULL as db_dbuid,

NULL as db_cdname,

NULL as db_ccsname,

NULL as db_csdname,

NULL as db_population,

NULL as db_geog,

NULL as db_fragment,

h.highway_id as highway_id,
h.roadname as highway_roadname,
h.placename as highway_placename,
h.rtnumber1 as highway_rtnumber1,
h.ogr_geography as highway_geog,
h.ogr_geography.STLength() as highway_length,
IIF(h.ogr_geography.STIntersection(h.ogr_geography).
STGeometryType() = 'GeometryCollection',

h.ogr_geography,

h.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000))) AS highway_fragment,

h.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)).STLength() as highway_fragment_length     

from dbo.populationcentre_4326 pc

inner join dbo.highway_official_current_4326 h

on h.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) = 1

 

 

6. Tableau-powered analytics

Now that all of the tables are joined together, we can go crazy with Tableau functionality to explore patterns in the data.   You can add filters, actions, join or blend in additional datasets for context, etc.

 

I’ll just give a quick example of some questions we can ask and answer with this new dataset and then leave it to you to dream up exciting dashboards to support your SQL Server-driven spatial analytics.

 

Put it all together with a parameter to drive the dynamic buffer size

 

Since our custom SQL relies on a parameter to define the buffer size, we can simply add that parameter to any of our worksheets or dashboards and adjust the buffer as needed.  Whenever we make that adjustment to the buffer, the query will re-run and we’ll see the results on any of the worksheets that rely on this data source.

 

This example shows the dissemination blocks, color encoded to show population count in each block, and the highway segments within the buffer. The data will update as the parameter is adjusted (the title for the worksheet will adjust too…it’s easy to add the parameter value to the worksheet title to emphasize the buffer size):

 

 

 

 

 

Know that as the buffer becomes larger the query time will increase, but with our spatial indexes, all of the queries that I’ve run locally have seemed quite reasonable.  For context on the time, a 20km buffer took about 5 seconds to run on my laptop.

 

 

Add a couple dashboards and highlight or filter based on population

We can also use standard Tableau functionality to link multiple worksheets together. For instance, we might want to look at the total population or kilometers of highway within the buffered distance for each Population Centre and then adjust the map based on the selected region:

 

 

 

 

Or you can let your imagination go wild with data joins, blends, calculated fields, or even additional spatial analyses to answer any number of complex questions!

 

In these posts, I’ve just touched on the set up, calculations, and basic analyses discussed in the original forums post on Help on mapping population census blocks using dynamic radius filter.  You can find more examples and fun questions in the post and can extend this type of analytic to your heart’s delight.

From late November 2018 through early 2019 there was an interesting question being tackled in the Tableau forums: Help on mapping population census blocks using dynamic radius filter

 

This question turned into one of the longest (and most fun) forums threads that I’ve ever been involved in. At some point in the forums discussion between Wilfred Eze (who asked the original question), Richard Leeke (a Tableau forums ‘Data Monarch’ and all around master at making Tableau do all sorts of awesome things), and myself, I said that I would summarize the process into something a little easier to digest than our 100+ back-and-forth posts on everything from data manipulation to spatial indexes to query optimization and finally making some actual vizes.    So, here it is.

 

Disclaimer & giving credit where it is rightfully due: Most of the heavy lifting that I did on this forums question was on data prep / import to SQL Server, some lightweight SQL demos, and writing this up so we would have a clean archive –  I didn’t mastermind much of the optimized SQL that drives the analytics behind this. The mastermind behind the SQL was Richard Leeke, who is infinitely more skilled with SQL Server than I am. I’ve done some re-writing and cleaning in tailoring for this blog post, but the ideas behind how it works are pretty much all Richard.

 

The post on the Community Forums ended up having 100+ posts to get to a solution.  It is pretty long and a bit of a slog to go through.  To keep things a little simpler, and more organized, than the forums post, I’ve broken this post into two parts – feel free to skip ahead to whatever is right for your needs.

 

  1. Intro to the problem, tools, data, and basic setup to move the data into SQL Server (this post)
  2. SQL Server and Tableau playing nice for some analytic awesome (Part II)

 

Intro to the problem, tools, data, and basic setup to move the data into SQL Server

 

The spatial analysis challenges:

Here are the challenges that were presented in the Community Forums question (Help on mapping population census blocks using dynamic radius filter).  I’m using this as an example for the blog post, but the methods are relevant to many different spatial analysis questions, so you should be able to adapt as needed for your project.

 

This set of posts addresses the following challenges:

 

1. Create dynamic, parameter-driven buffers around point locations (city centres).

For every city centre point location, we need to be able to draw a circular buffer of any size and visualize that in Tableau.  We then want to be able to use these circles to identify overlapping locations in other spatial datasets…

cityCentreBuffer.gif

 

2. Use the city center buffers to identify:

    • Population living within that radius, based on Census geographies
      • What Census subdivisions or dissemination blocks are inside the buffer? Update the polygons selected when the buffer is adjusted.
      • Show entire polygon, or split the polygons to only show the portion within the buffer

 

buffer_pop.gif

 

  • Characteristics of road networks within the buffer
    • What roads are inside the buffer?  Update the roads when the buffer is adjusted.
    • Intersect road network lines with buffers, analyze length of segment within buffer or other characteristics based on segment length

 

buffer_highway.gif

 

3. Visualize just the data in the buffer – or all data, but indicate which buffer it falls inside

Show it all in Tableau and make it useful for analytics!

 

For instance – here is a 20km buffer drawn around Saskatoon, Saskatchewan, with all roads clipped to the extent of the buffer, and detail from only the Census Subdivisions that intersect the buffer:

 

Of course, as happens with most every data analysis project – just getting to the analysis part required a good bit of data prep.  In this series of posts, as I discuss the process of solving the spatial analysis challenges, I’ll also provide some detail on what we needed to do to set everything up (from pre-processing the spatial data to loading it up into SQL Server to actual querying…)

 

Tools to do some Tableau spatial analytic magic:

All in all, the methods that I discuss in these blog posts involve:

 

  • Tableau 2018.1+ - In Tableau 2018.1 we introduced native support for GEOGRAPHY data types from SQL Server.  It’s awesome and opens the door for so many cool spatial analyses!

 

  • QGIS – free, open source GIS package (for some data pre-processing).

 

  • OSGeo4W – for importing data to SQL Server.  I’ve heard that some people like using a GUI-based tool called Shape2SQL, but I find that it’s so quick and easy to just use the command line that I go that route.  I also tried once to make Shape2SQL work, but I failed and never tried again.  Your mileage may vary.  My instructions will only cover OSGeo4w.

 

 

The data:

To make it easier to play along, I’ve attached all of the files that we were working with so that you can download and play along.  I’ve included the original version of the files and the processed and ready for SQL Server version.  There should be links to download the attachments at the end of the blog post.

 

  • Census Subdivisions (950 polygons)

  • Census dissemination blocks (54,118 polygons)

  • Highways (24,769 line segments)

  • City population centres (61 polygons)

 

 

Data as shapefiles -> tables in SQL Server

 

The first challenge with this project was to simply move the data into SQL Server so that we could write some spatial SQL queries to drive the analyses.  This felt like it was going to be easy, but it turns out that with SQL Server there were a few tricks when working with these particular datasets.

 

To be honest, I ran into more problems with importing these particular datasets into SQL Server than I have with any other spatial datasets, so it may be much easier for you when you’re working with your own data.  But…I guess for a blog post on how to make things work sometimes it’s nice to have some tricky data to use in explaining how to bend everything to your will

 

Note: I have found the process of importing these particular datasets to PostgreSQL to be a bit easier, because PostgreSQL is more flexible with the spatial reference information (SRID) and the data types…so there aren’t quite so many hoops to jump through.  And…as of late December 2018, Tableau was in the Alpha for PostgreSQL spatial (PostGIS) support!  Yay!

Perhaps when we’re in Beta or final release for PostGIS support I’ll re-create and write up the how-to with that database. It really will be a new challenge to tackle, because there are some interesting differences in the spatial functionality between the two databases.

 

Create a database in SQL Server

When I installed SQL Server Express, I also installed the SQL Server Management Studio (SSMS). This makes it easy to interact with your database using a nice GUI.

 

 

After you have opened SSMS and connected to your server (I’m running my databases using localhost, but you may be using a remote host – connect to wherever your server lives), you’ll want to right click on the Databases ‘folder’ and create a New Database…

 

 

Name your database and go ahead and use the default settings.

 

You should now have a database set up where you can store and access the data for your analysis in Tableau

 

 

Spatial reference challenges – re-project shapefiles to get ready for import to SQL Server

Now that you have your database set up, we need to import the shapefiles.  But…there is a quick step to do first.  If your data isn’t already in a latitude and longitude-based coordinate system (e.g., NAD83, WGS84, etc.), you will need to re-project so that we can easily make and populate a GEOGRAPHY column in SQL Server.

 

Tableau likes data from SQL Server to be of a GEOGRAPHY data type (latitude and longitude so that it is analyzed as if it’s on the surface of the earth, as opposed to data being in a GEOMETRY data type where the data is analyzed as if it’s projected onto a flat surface).  The original shapefiles for this project were in UTM Zone 13 North.  This is a great coordinate system if you are working with data in Saskatchewan, Canada, but it isn’t great if you want to use it in SQL Server and then drop it into Tableau.

 

In case you’d like a tutorial or refresher on spatial data types in SQL Server, here you go: Overview of spatial data types in SQL Server

 

First thing - we needed to re-project the data so that it would play nicely with SQL Server and Tableau's desire for GEOGRAPHY.

 

To learn more about map projections and coordinate systems, check out Working with projected data in Tableau – Part I: Map projection basics

 

There isn't an easy method for transforming coordinate systems in SQL Server, so I opted to do this data prep stage in QGIS.  QGIS is a free, open source geographic information system (GIS) software.  Re-projecting the data is easy - just open the shapefile and 'Save As...' and select the new coordinate system that you want to use for the shapefile.

 

I changed all of the shapefile data sources for this project into the World Geodetic Survey of 1984 (WGS84) coordinate system (SRID:4326).

 

Get all of the shapefiles into SQL Server

 

Normally this is a super-easy process.  It’s just a quick command line import process (which should work for most datasets).    I use the OSGeo4W shell for importing.  There are apparently other options, including a GUI for importing shapefiles to SQL Server (e.g., SHP2SQL), but I have found this more difficult to set up and use.

 

Using the OSGeo4w shell, I normally just use a simple set of commands for OGR2OGR.  OGR2OGR allows for transformations between all sorts of spatial data formats, including from Shapefile to SQL Server.

 

There are about a billion options for the commands

 

 

I have generally had success with a simple import command for Shapefiles -> SQL Server using the OSGeo4W shell, like this:

 

ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLEXPRESS;Database=SqlServerTableauDemo;Trusted_Connection=yes" "c:\temp\

myShapefile.shp" -a_srs "EPSG:4326"

 

What does that bunch of words mean?? Let’s break it down a bit:

 

ogr2ogr

This is the tool you’re going to use – it converts simple spatial features between file formats.  We’ll use it to convert between Shapefile and SQL Server

-f "MSSQLSpatial"

The file format that we are converting to – MS SQL Server format.  The -f is the flag that indicates the next bit of text is the file format

"MSSQL:server=blah blah blah”

Details on the SQL Server instance where we are putting the data

  • server=localhost\SQLEXPRESS --- this says ‘look for a SQL Server Express instance on localhost’  Put in whatever info you need to locate the server with your SQL Server instance (it might be localhost, it might be housed remotely somewhere…)
  • Database=SqlServerTableauDemo --- this says ‘look for this particular database called SqlServerTableauDemo’ You need to have already created the database before you try to import the data
  • Trusted_Connection=yes --- I have my database set up to use my windows login since I’m running on a localhost.  If you need to specify a user name and password, you would swap this statement out for something like
    User Id=<insert your user id>;Password=<insert your password>

"c:\temp\myShapefile.shp"

The location and name of the shapefile that you want to import

-a_srs "EPSG:4326"

The coordinate system for the shapefile that you want to import (using the EPSG code to specify).   How do you find this magical code?  You can look up the coordinate system on the EPSG web site, or if you’ve manipulated the coordinate system in QGIS, the EPSG code will often be listed with the coordinate system name.  The two that I use most frequently for US-based or world datasets:

  • 4326 – World Geodetic Survey of 1984 (WGS84)
  • 4269 – North American Datum of 1983 (NAD83)

 

When working with shapefiles, I often just open the .prj file for the shapefile in a text editor, look at the coordinate system name, and then search for that on the EPSG web site.

 

That is what I normally use successfully with SQL Server. But, of course, these particular files threw a few errors that we had to sort out ☹

 

ERROR 1: INSERT command for new feature failed. [Microsoft][SQL Server Native Client 11.0][SQL Server]Arithmetic overflow error converting numeric to data type numeric.

ERROR 1: Unable to write feature 1 from layer CensusSubdivision_4326.

ERROR 1: Terminating translation prematurely after failed

translation of layer CensusSubdivision_4326 (use -skipfailures to skip errors)

 

As the error noted, I could just use -skipfailures to skip all of the errors, but that ended up skipping almost all of the records in the dataset!  So, we need another solution.  The key to the solution is the first error in that there was an “arithmetic overflow error converting numeric to data type numeric.”  That tells us that something is going on with the attributes being put into a numeric column.

 

The fix that we found for this problem was to just unset the field width so that it could handle the attributes correctly.

 

So if you want to try to follow along at home, you’ll use import commands with an extra flag on them to “unset the field width” – just add the “-unsetFieldWidth” flag at the end of the command for each file (see examples below).  In the course of getting the data into SQL Server, we discovered that these particular datasets had some values that fell outside of the range for the default data type set by SQL Server.  This just gets around that problem and ensures that all of the attributes in each field are imported correctly.

 

There may still be a handful of ring self-intersection errors that come up on import, but after looking at those individual intersections in QGIS, it seemed like they weren’t going to be much of an issue for these analyses, so I just ignored them.  That might not be the wisest solution in all cases, but it was the path that I took for this demonstration…

 

In the OSGeo4W shell (NOT from a regular Windows or Mac command prompt), use these commands as a template.  Remember to update the information on the location / details for your SQL Server instance, and the location of your shapefiles.

 

  • ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLExpress;Database=<your database name>;Trusted_Connection=yes" "c:\temp\populationcentre_4326.shp" -a_srs "EPSG:4326" -unsetFieldWidth

 

  • ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLExpress;Database=<your database name>;Trusted_Connection=yes" "c:\temp\highway_official_current_4326.shp" -a_srs "EPSG:4326" -unsetFieldWidth

 

  • ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLExpress;Database=<your database name>;Trusted_Connection=yes" "c:\temp\censussubdivision_4326.shp" -a_srs "EPSG:4326" -unsetFieldWidth

 

  • ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLExpress;Database=<your database name>;Trusted_Connection=yes" "c:\temp\disseminationBlock_4326.shp" -a_srs "EPSG:4326" -unsetFieldWidth

 

 

After you’ve imported all of the shapefiles, you should see them listed as Tables in your database. By default they will all have the same name as the shapefile that you used as input:

 

Create the GEOGRAPHY in SQL Server

 

Now that the data is in SQL Server, we’re only have a few quick bits of processing left.  We’re going to create a GEOGRAPHY column and a spatial index.  This will 1) convert the data into a format that Tableau likes, and 2) speed up the spatial analysis.

 

By default, only GEOMETRY is created when importing the shapefiles into SQL Server, so we’ll need to create a GEOGRAPHY column so that Tableau can read the data.  In the process of doing this, we’re going to add in a little trick where we reverse the order of the vertices in all of our polygons.  We need to do this because Shapefiles and SQL Server have different semantics on how polygons should be rendered – and if you don’t swap the ordering, your polygons may render “inside out” in Tableau or any of the analyses with the polygons may be inverted.  Just trust me, you don’t want that.

To create the GEOGRAPHY and swap the vertex ordering at the same time, you can do this in SQL Server (broken down below for each table that was imported – but you could just string them all together into one long query if you want):

 

Let’s break down the basics of the query and look at what this SQL is going to do (Some copy/paste-able queries will be right after this…don’t worry):

 

alter table dbo.populationCentre_4326

add ogr_geography geography;

Take one of the tables of data (the imported shapefile) and add a new column called “ogr_geography” that is a GEOGRAPHY data type

update dbo.populationCentre_4326

set ogr_geography =

ogr_geometry

.MakeValid()

.STUnion(ogr_geometry.MakeValid()

.STStartPoint())

.MakeValid()

.STAsText()

Now update the table and populate the ogr_geography column with the polygon that was imported from the shapefile.  The long string of extra stuff after “ogr_geometry” (the original polygon geometry imported from the shapefile) is a quick way of reversing the vertex order for the polygon from the shapefile and then converting it to a GEOGRAPHY data type. 

More info on that process? Working with invalid data and the SQL Server 2008 geography data type

 

And here is the text that you can copy/paste for each table of data – just open up a new query and run each of these…

 

  • Population Centre

 

alter table dbo.populationCentre_4326

add ogr_geography geography;

GO

update dbo.populationCentre_4326

set ogr_geography=ogr_geometry.MakeValid().STUnion(ogr_geometry.MakeValid().STStartPoint()).MakeValid().STAsText()

 

While we’re at it with the population centres, we should go ahead and add a field with just the centroids (a point located roughly in the middle of the population centre polygon) – since that is what we are going to use for our final calculations (what is within a set distance of the centroid / approximate middle of the population centre polygon):

 

alter table dbo.populationcentre_4326

add centroid geometry;

GO

update dbo.populationcentre_4326

set centroid = ogr_geometry.STCentroid();

GO

alter table dbo.populationcentre_4326

add centroid_geog geography;

GO

update dbo.populationcentre_4326

Set centroid_geog = geography::STGeomFromText(centroid.STAsText(), 4326)

 

For reference, when you run this query  you should see something like this:

 

 

  • Highway official current

 

alter table dbo.highway_official_current_4326

add ogr_geography geography;

GO

alter table dbo.highway_official_current_4326

add wkt varchar(max);

GO

update dbo.highway_official_current_4326

set wkt = ogr_geometry.STAsText();

GO

update dbo.highway_official_current_4326

set ogr_geography = geography::STGeomFromText(wkt, 4326).MakeValid();

 

  • Census subdivision

alter table dbo.censusSubdivision_4326

add ogr_geography geography;

GO

update dbo.censusSubdivision_4326

set ogr_geography =ogr_geometry.MakeValid().STUnion(ogr_geometry.MakeValid().STStartPoint()).MakeValid().STAsText()

 

  • Dissemination block

alter table dbo.disseminationBlock_4326

add ogr_geography geography;

GO

update dbo.disseminationBlock_4326

set ogr_geography =ogr_geometry.MakeValid().STUnion(ogr_geometry.MakeValid().STStartPoint()).MakeValid().STAsText()

 

Set up some spatial indexes (speed up the queries!)

 

While the datasets we are working with aren’t gigantic, the performance of our analyses will still benefit from setting up spatial indexes on the GEOGRAPHY for each of our tables. This will optimize the spatial queries.

 

A bit of reading on spatial indexes, if you’re curious: About spatial indexes

 

The easiest way to do this is just to expand the table in the object explorer in SQL Server Management Studio and create a new spatial index

 

 

Accept the default names in the dialog box and select Add… to add a column to the index.

 

 

You’ll use the ogr_geography (or the centroid_geog for the city centres, if that is the table you are currently working with) and click OK

 

 

Then you’ll go into the Spatial index options and set the Tessellation Scheme to ‘Geography Grid’ (NOT the Geometry Grid). After that, just click okay and the spatial index on GEOGRAPHY should be built.

 

 

Now that everything is all set up, we can take a minute to make sure that it’s all happy and visible in Tableau.

 

Open up Tableau 2018.1 or later.  Connect to your SQL Server database,

 

 

Select your database, pick a data table that looks interesting to you ,and drop the table in as the data source.  I’d recommend using one other than the Dissemination Blocks (it’s sort of big and takes the longest to render).

 

 

 

Then go to a new worksheet, drop the ogr_geography dimension onto a viz and you should see your datasets rendered.

 

 

Hopefully all of that is working for you and we can move on to the next post in this series where we do some analytics!