In the first part of this post series, we took some raw spatial data (as shapefiles), did a bit of prep and loaded it up in SQL Server so that we could access the spatial data - and use the built in spatial functionality of SQL Server - in Tableau.

 

Most of this post relies on dynamic queries run directly through SQL Server, however, to demonstrate the build up of the Tableau workbook I have attached a packaged workbook running from an extract that you can use as a reference for constructing the worksheets and dashboards. Note that this workbook will NOT be dynamically adjustable (i.e., you cannot change the parameter and have the buffer-driven analytics update...because it isn't connected to SQL Server, it's just running from a Tableau extract).  But...it should provide assistance if you are having trouble with the dual axis maps or setting up filters.

 

Now that all of our datasets are in Tableau, we can explore options for using the spatial analysis functions in SQL Server to drive some interesting analyses in Tableau.

 

There are two ways that you can integrate the SQL Server spatial functionality in Tableau to augment your analytics: Custom SQL and Raw SQL.

 

Custom SQL: Allows the user to use table-level operators that Tableau doesn’t generate.  This means that you can create entirely new data source tables created directly from queries in SQL Server.

 

RawSQL: Allows you to use column-level functions that Tableau doesn’t generate (e.g., a calculated field that uses a RawSQL query to calculate the area for a polygon).

 

These two options make it so that if you can write the query in SQL Server, you can use it in Tableau and then use the results in your visualizations.

 

Note: if you want to dig into performance issues with Custom SQL, take a look at Alan Eldridge’s “Designing Efficient Workbooks” whitepaper.  It’s a few years old at this point, but still has a lot of relevant detail on Custom SQL

 

For the most part for the spatial analytic problems addressed in this blog post we’ll want to use Custom SQL to alter the structure of the Tableau data source table, but I’ll show you an example of RawSQL as well.


Let’s start with simple queries and work our way up…

 

1. Dynamic buffers

The core of the problem that we’re tackling is that we have point locations for city centres (the centroids that we calculated in Part I) and we want to be able to identify what is within a set (but adjustable) distance from the points.  To do that, we’ll make buffers around each city centre and use a parameter in Tableau to dynamically resize them.  For this simple starting point, we’ll look at how we can create the buffers with both Custom SQL and RawSQL.

 

RawSQL method - for column-level alteration of our data source

 

Connect to the SQL Server database that you made in the first part of the blog, and add the populationcentre_4326 table:

 

 

On a new worksheet, create a new calculated field:

 

 

And enter this calculation (this is the non-dynamic version that we’ll start with – we’ll add a parameter to control it next):

 

 

RAWSQL_Spatial("%1.STBuffer(1000)", [Centroid Geog])

 

If you add this newly created calculated field to the viz, you should see 1 km (or 1000 meter) buffer polygons.  The units for buffers created on GEOGRAPHY data types are in meters. You can manually change the size of the buffer by editing the calculated field and replacing “1000” with whatever distance you want.

 

 

To make the buffer dynamic, we just need to create a Parameter, and then drop that into the RawSQL calculation instead of the hard coded distance:

 

RAWSQL_Spatial("%1.STBuffer(%2 * 1000)", [Centroid Geog], [Buffer distance (KM)])

 

Alternatively, we could have written that calculation with the buffer distance being converted from kilometers to meters outside of the query passed to SQL Server, which may speed things up a tiny big... This other way of writing the query looks like this (we just shift the * 1000 to the end:

 

RAWSQL_Spatial("%1.STBuffer(%2)", [Centroid Geog], [Buffer distance (KM)] * 1000)

 

If you add your parameter to the worksheet, you can now adjust the buffer size dynamically. Every time you change the parameter value a new polygon will be calculated and rendered.

 

cityCentreBuffer.gif

 

Custom SQL to augment the data source directly

Our other option for running analytics through SQL Server is to just alter the data source table on creation (instead of just calculating a single new column as a calculated field).  This is just like writing a query in SQL Server to generate a new data table. Only the data that you request in the query will end up in the data source table.

 

Instead of adding a specific table directly from our database, we’ll add ‘New Custom SQL’ instead:

In the Custom SQL window that opens up, we’ll add the following query to generate a super-simple table that only contains a buffer (named “dynamic_buffer”):

 

select centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as dynamic_buffer

from populationcentre_4326

 

Notice that the query above uses the parameter we created earlier (Buffer distance (KM)).  The easy way to add that into a query is to use the Insert Parameter drop down in the Edit Custom SQL window:

 

If you open a new worksheet, you should see this one new field.  You can add it to a viz and see the buffers.  By adding the parameter to the worksheet as well, you can adjust them dynamically – just like we did with the RawSQL generated buffer.

 

 

Notice that there is only one measure in this data source – that is because we only included one field in our query!  We could have easily included any of the fields in the populationcentre_4326 table in our query.  We’ll see lots of examples of this in the following steps where we want more information about our data as a result of our Custom SQL queries.

 

2. Buffers + polygon intersections

That last example was a fun example of dynamic spatial data in Tableau, but now let’s take it one step farther and actually use those buffers to select our regions of interest.  This is where the power of SQL Server spatial really lets us go wild… 

At this point I’m going to switch to just using CustomSQL because I want to simplify the process and just end up with a single table to consolidate all of the detail that I want from all of the relevant tables in my database.

 

Let’s grab all of the Census dissemination blocks that intersect our dynamic buffers (just like in the image above).  To get this, it’s just a matter of writing the proper SQL query and dropping it in as custom SQL.  Here is the query that I used –first I’ll give you the copy/paste version and then explain what it’s doing:


select

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

db.dbuid as db_dbuid,

db.cdname as db_cdname,

db.ccsname as db_ccsname,

db.csdname as db_csdname,

db.pop as db_population,

db.ogr_geography as db_geog

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000)) = 1

 

 

With this query we can dynamically select every polygon that overlaps with our buffer – and as we change the buffer size the selected polygons will update accordingly.

 

In case you were curious why the query is set up to only keep the intersecting polygons… we don’t have to write the query that way.  We could easily just return all polygons and a field to let us know if any given polygon intersects a city centre polygon (I don’t recommend this in most cases, but I’ll show you how anyway).

 

To demonstrate how this would work, I’ll use a smaller file with just Census subdivisions – here is the result we would get to show all Census subdivisions in the entire dataset with just the ones within the buffer around the Saskatoon city centre highlighted:

 

If you want to try this, here is the query that I used to create the viz above – its similar to the previous, but instead of doing an inner join, we just reference both tables and compare ALL possible combinations of city centre buffers and Census subdivisions.  We also add in a new field that is returned that tells us true/false on whether any Census subdivision intersects any city center buffer (these parts are in bold below):

 

select pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

subdiv.cduid as subdiv_cduid,

subdiv.cdname as subdiv_cdname,

subdiv.population as subdiv_population,

subdiv.popdensity as subdiv_popDensity,

subdiv.ogr_geography as subdiv_geog,

subdiv.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as within_buffer

from dbo.populationcentre_4326 as pc, dbo.censussubdivision_4326 as subdiv

 

Why would we choose NOT to write our query this way and just return all of the polygons?

 

Unless you really need this level of detail, you want to be careful with queries like this – it’ll bloat up your data quickly.  This leads to a table with ALL combinations of every row in both tables in the query. So, we end up with a TON of records in our final table – this makes it slow to compute (a lot more spatial calculations) and slower to render (a lot more polygons).  This particular query results in 57,950 polygons or (950 Census Subdivisions * 61 population centres).   Hopefully you can see why I didn’t want do this query with the Census dissemination blocks (54,118 dissemination blocks * 61 population centres = ~3.3 million records!)

 

The collected (aggregated) geography from just the Census Subdivisions + population centres is actually large enough and detailed enough to throw an error in Tableau until the polygons are disambiguated…

 

 

3. Buffers + line intersections / clipping

Now that we’ve looked at how to explore the intersection between buffers and Census subdivisions,  we’ll move on to looking at how to segment out the highways so that we have only the highway bits that fall inside the buffers.  This is a line and polygon intersection instead of the polygon and polygon intersection we used above.

 

To do this, we’ll use two different spatial operations – STIntersects and STIntersection.  They sort of sound the same, but they have a useful distinction.  The intersects operator just gives us a 0 or 1 if the spatial features intersect. The intersection operator will give us just the part of the spatial feature that intersects (thus, we can clip the roads to the extent of our polygon buffer).

 

STIntersectsreturns 1 if a geography instance intersects another geography instance; returns 0 if it does not

 

STIntersectionreturns an object that represents the points where a geography instance intersects another geography instance.

 

Here is the query…one part is in bold because it’s worth talking about in a little more detail:

 

select

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

h.highway_id as highway_id,

h.roadname as highway_roadname,

h.placename as highway_placename,

h.rtnumber1 as highway_rtnumber1,

h.ogr_geography as highway_geog,

h.ogr_geography.STLength() as highway_length,

IIF(h.ogr_geography.STIntersection(h.ogr_geography).STGeometryType() = 'GeometryCollection',

h.ogr_geography,

h.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000))) as highway_fragment,

h.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(<

Parameters.Buffer distance (KM)> * 1000)).STLength() as highway_fragment_length     

from dbo.populationcentre_4326 pc

inner join dbo.highway_official_current_4326 h

on h.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000)) = 1

 

Part of the query is shown in bold above because it uses some syntax that is different than the other queries we’ve seen it this post.  There is an IFF (if and only if) statement included that checks to see if the result of the intersection happens to be a Geometry Collection instead of just regular line segments.  This is important because Tableau doesn’t yet support geometry collections, so we want to make sure that we don’t have any of those in our result table.  The IFF statement just checks to see if a geometry collection was returned as the result of the intersection, and if so, replaces that record with the ogr_geography value.  If the result isn’t a geometry collection (i.e., it is a regular linestring), we simply populate the record with the result of the buffer.

 

This is just a safety check to help avoid potential errors from the intersection (and it turns out that there are a few records in the highway dataset that lead to this particular problem…so we’ll just avoid it)

 

4. Combining subdivisions and buffer to show visual extent of clip

In the long run, we’re going to want to show three different datasets: the buffer extent, the clipped roads, and the Census dissemination blocks.   Tableau is good with creating dual axis maps, but this would require three axes or ‘layers,’ which requires some creative manipulation of the data.  We’ll get around the dual axis restriction by splitting our dissemination blocks into two sets of polygons: the portion inside the buffer and the portion outside the buffer.  This will leave us with a nice visual of the buffer AND a way to explore the polygon areas inside and outside the buffer in case we want to get more granular with our analyses.

 

Note that these are just the dissemination blocks that intersect with the buffer and we are simply splitting them into two parts to show the portion of the subdivision inside the buffer and the portion outside the buffer.

Dividing up the data like this is a two part query – one part will select just the portion of the polygons inside the buffer, and one will select the portion outside the buffer. Then we will union them together into one single data table.  To identify the polygons inside and outside the buffer more easily when we do the union, let's go ahead and add in a field that will tell us which part each polygon is on (inside or outside).  We'll make a new 'type' field when we run each of our queries (in bold in the queries below).

 

To highlight the differences between the two queries I’ll walk through them one at a time, and then we’ll union them together to get both parts in the same table.

As you’re looking at the queries, there is really just one key difference: one uses STIntersection and one uses STDifference to select the polygon fragments.

 

STDifferencereturns an object that represents the point set from one geography instance that lies outside another geography instance

 

STIntersectionreturns an object that represents the points where a geography instance intersects another geography instance.

 

Select just the part of the polygons INSIDE the buffer:

select

'Inner Subdivision Segment' as type,

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

db.dbuid as db_dbuid,

db.cdname as db_cdname,

db.ccsname as db_ccsname,

db.csdname as db_csdname,

db.pop as db_population,

db.ogr_geography as db_geog,

db.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000)) as db_fragment

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000)) = 1

 

 

 

Select just the part of the polygons OUTSIDE the buffer:

select

'Outer Subdivision Segment' as type,

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

db.dbuid as db_dbuid,

db.cdname as db_cdname,

db.ccsname as db_ccsname,

db.csdname as db_csdname,

db.pop as db_population,

db.ogr_geography as db_geog,

db.ogr_geography.STDifference(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as db_fragment

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000)) = 1

 

 

 

To put these two tables of polygons together into one single dataset, we just union them by adding ‘UNION ALL’ in between the two queries – like this:

 

select

'Inner Subdivision Segment' as type,

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

db.dbuid as db_dbuid,

db.cdname as db_cdname,

db.ccsname as db_ccsname,

db.csdname as db_csdname,

db.pop as db_population,

db.ogr_geography as db_geog,

db.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as db_fragment

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) = 1

 

UNION ALL

 

select

'Outer Subdivision Segment' as type,

    pc.ogr_fid as pc_ogr_fid,

    pc.pcname as pc_name,

    pc.pcuid as pc_pcuid,

    pc.population as pc_population,

    pc.ogr_geography as pc_geog,

    pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

    db.dbuid as db_dbuid,

    db.cdname as db_cdname,

    db.ccsname as db_ccsname,

    db.csdname as db_csdname,

    db.pop as db_population,

    db.ogr_geography as db_geog,

    db.ogr_geography.STDifference(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as db_fragment

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) = 1

 

5. Putting it all together:

 

Now let’s add the clipped roads in too so that we can see and interact with the buffer, the Census data, and the road network in one viz.  Just like how we unioned the two polygon tables together (the polygon parts inside the buffer and outside the buffer), we’ll do something similar with our clipped roads data, but with one additional bit in the queries….

 

In order to union the tables together we need the same fields in each table, so we’ll augment each individual query in the union.  Because the roads use different field names than the polygons, so we’ll add those fields in as NULL in the polygon queries, and we’ll add the polygon fields in as NULL in the query for the highways. We'll also add in a 'type' field to the highways so that it matches what we added to the dissemination blocks earlier.

 

The whole query looks like this – our three select statements from earlier in the post unioned together, with the minor modifications discussed above to make sure each resulting table has the same columns:  I’ve highlighted the parts that are changes from the queries that we used earlier:

 

select

'Inner Subdivision Segment' as type,

pc.ogr_fid as pc_ogr_fid,

pc.pcname as pc_name,

pc.pcuid as pc_pcuid,

pc.population as pc_population,

pc.ogr_geography as pc_geog,

pc.centroid_geog as pc_centroid_geog,

pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

db.dbuid as db_dbuid,

db.cdname as db_cdname,

db.ccsname as db_ccsname,

db.csdname as db_csdname,

db.pop as db_population,

db.ogr_geography as db_geog,

db.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as db_fragment,

NULL as highway_id,

NULL as highway_roadname,

NULL as highway_placename,

NULL as highway_rtnumber1,

NULL as highway_geog,

NULL as highway_length,

NULL as highway_fragment,

NULL as highway_fragment_length

from dbo.populationcentre_4326 as pc

inner join dbo.disseminationblock_4326 as db

on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) = 1

 

UNION ALL

select

'Outer Subdivision Segment' as type,

pc.ogr_fid as pc_ogr_fid,
pc.pcname as pc_name,
pc.pcuid as pc_pcuid,
pc.population as pc_population,
pc.ogr_geography as pc_geog,
pc.centroid_geog as pc_centroid_geog,
pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,
db.dbuid as db_dbuid,
db.cdname as db_cdname,
db.ccsname as db_ccsname,
db.csdname as db_csdname,
db.pop as db_population,
db.ogr_geography as db_geog,
db.ogr_geography.STDifference(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) as db_fragment,

NULL as highway_id,
NULL as highway_roadname,
NULL as highway_placename,
NULL as highway_rtnumber1,
NULL as highway_geog,
NULL as highway_length,
NULL as highway_fragment,
NULL as highway_fragment_length

from dbo.populationcentre_4326 as pc
inner join dbo.disseminationblock_4326 as db
on db.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(
<Parameters.Buffer distance (KM)> * 1000)) = 1

 

UNION ALL
select

'Highway segments' as type,

pc.ogr_fid as pc_ogr_fid,
pc.pcname as pc_name,
pc.pcuid as pc_pcuid,
pc.population as pc_population,
pc.ogr_geography as pc_geog,
pc.centroid_geog as pc_centroid_geog,
pc.centroid_geog.STBuffer(<Parameters.Buffer distance (KM)> * 1000) as pc_buffer_geog,

NULL as db_dbuid,

NULL as db_cdname,

NULL as db_ccsname,

NULL as db_csdname,

NULL as db_population,

NULL as db_geog,

NULL as db_fragment,

h.highway_id as highway_id,
h.roadname as highway_roadname,
h.placename as highway_placename,
h.rtnumber1 as highway_rtnumber1,
h.ogr_geography as highway_geog,
h.ogr_geography.STLength() as highway_length,
IIF(h.ogr_geography.STIntersection(h.ogr_geography).
STGeometryType() = 'GeometryCollection',

h.ogr_geography,

h.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000))) AS highway_fragment,

h.ogr_geography.STIntersection(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)).STLength() as highway_fragment_length     

from dbo.populationcentre_4326 pc

inner join dbo.highway_official_current_4326 h

on h.ogr_geography.STIntersects(pc.centroid_geog.STBuffer(

<Parameters.Buffer distance (KM)> * 1000)) = 1

 

 

6. Tableau-powered analytics

Now that all of the tables are joined together, we can go crazy with Tableau functionality to explore patterns in the data.   You can add filters, actions, join or blend in additional datasets for context, etc.

 

I’ll just give a quick example of some questions we can ask and answer with this new dataset and then leave it to you to dream up exciting dashboards to support your SQL Server-driven spatial analytics.

 

Put it all together with a parameter to drive the dynamic buffer size

 

Since our custom SQL relies on a parameter to define the buffer size, we can simply add that parameter to any of our worksheets or dashboards and adjust the buffer as needed.  Whenever we make that adjustment to the buffer, the query will re-run and we’ll see the results on any of the worksheets that rely on this data source.

 

This example shows the dissemination blocks, color encoded to show population count in each block, and the highway segments within the buffer. The data will update as the parameter is adjusted (the title for the worksheet will adjust too…it’s easy to add the parameter value to the worksheet title to emphasize the buffer size):

 

 

 

 

 

Know that as the buffer becomes larger the query time will increase, but with our spatial indexes, all of the queries that I’ve run locally have seemed quite reasonable.  For context on the time, a 20km buffer took about 5 seconds to run on my laptop.

 

 

Add a couple dashboards and highlight or filter based on population

We can also use standard Tableau functionality to link multiple worksheets together. For instance, we might want to look at the total population or kilometers of highway within the buffered distance for each Population Centre and then adjust the map based on the selected region:

 

 

 

 

Or you can let your imagination go wild with data joins, blends, calculated fields, or even additional spatial analyses to answer any number of complex questions!

 

In these posts, I’ve just touched on the set up, calculations, and basic analyses discussed in the original forums post on Help on mapping population census blocks using dynamic radius filter.  You can find more examples and fun questions in the post and can extend this type of analytic to your heart’s delight.