Skip navigation
2019
Sarah Battersby

Spatial SQL Overview!

Posted by Sarah Battersby Employee Nov 13, 2019

At TC19 I gave a presentation on "Hit the analytic jackpot with spatial SQL"   (video / slides on YouTube) When I started going through all of the material that I had planned I realized that I had about 90 minutes, or maybe more, of extra content.  I also realized that it was going to be a PAIN for anyone trying to take notes and / or remember anything that I went through.   To try to alleviate these problems, I started writing...and kept writing...and wrote some more...and dug up old things that I'd already written... and then I threw them all onto my blog here.

 

This post is just an easy way to keep track of all of the different links.

 

If you want to know more about spatial SQL and want some examples of things that you can do, here is a collection:

 

Background

PostGratuitous image

 

 

Fun stuff related to the TC talk

PostGratuitous image

 

 

Older posts

PostGratuitous image
seattle_routing_animation.gif

In this post I’ll walk through the process of pushing spatial data into PostgreSQL and SQL Server Express for use in Tableau.  The process should be similar for other spatial file formats, but, since shapefiles are probably the most common source I’m just going to focus on that process for now.

 

Tools that I use:

  • OSGeo4W – a set of open source geospatial tools. Includes QGIS, GDAL/OGR, GRASS, etc. I use QGIS and OGR quite a bit when manipulating data and pushing into databases
  • SQL Server Express – I use SQL Server Management Studio to interact with the database
  • PostgreSQL - I currently use OmniDB to interact with the database, but have used pgAdmin as well (I don’t like the new browser-based version – just a personal preference, nothing wrong with it otherwise)
  • PostGIS & PostGIS 2.0 Shapefile and DBF Loader Exporter – the shapefile importing tool comes with the OSGEO installation of PostGIS

 

Tips:

  • Tableau supports three coordinate systems for data in spatial databases.
    • NAD83 (EPSG: 4269)
    • ETRS89 (EPSG: 4258)
    • WGS84 (EPSG: 4326)

 

Transform your data into one of these before you put it into the database if you can. If you’re working with data in SQL Server you need to make sure that you’re starting with the data in a geographic coordinate system; with PostgreSQL you’ll have more flexibility to transform the data so you can always update later if needed.

 

More information on coordinate reference systems, identifying them, and updating them in a separate blog post here: Spatial SQL - Identify and Update Coordinate Reference Systems

 

  • Data in the database may be either GEOMETRY or GEOGRAPHY
    • For SQL Server – Tableau will visualize GEOGRAPHY
    • For PostgreSQL – Tableau will visualize GEOGRAPHY or GEOMETRY

Data that I use in this post:

  • Las Vegas Animal Shelters (a shapefile of point data).  I've attached two versions of this file to the post so that you can try it out.  Each uses a different coordinate reference system.
    • AnimalShelters_4326 - the shapefile in WGS84 coordinate reference system (EPSG 4326)
    • AnimalShelters_102707 - the shapefile in State Plane, Nevada East coordinate reference system (EPSG 102707)

 

There are two sections in this post - scroll to the one you want

 

  • PostgreSQL / PostGIS
  • SQL Serverl

 

 

Spatial data into PostgreSQL / PostGIS

Assumptions:

For reference, I am using:

PostgreSQL:  Latest version from EnterpriseDB

PostGIS: Install from OSGEO (postGIS, osmconvert, osm2pgrouting)

 

I’m also assuming that you already have a spatially enabled database to work with. If not, make one.

 

PostgreSQL (via pgAdmin) – the process will be similar with other database management tools, just make the database and then create the  PostGIS extension.

Create the database

 

 

Then create the postgis extension so that you can work with spatial data

 

 

 

Import data with either shapefile import/export GUI or OSGeo4W / OGR

PostgreSQL is a little more friendly when trying to deal with data in projected coordinate systems because it has nice tools for transforming the data.   In order to take advantage of those tools, you just need to make sure you have the right CRS defined to start.  It’s easiest to just define that when importing the data, but you can update it after the fact if you forget.

 

With Shapefile import/export manager – it’s easy and graphical!  Fun!

 

Open the PostGIS Shapefile Import/Export Manager, connect to the database, Add a File, set the SRID (spatial reference ID...this is the code for the coordinate reference system) and click Import.  If you don’t set the SRID, it defaults to 0 and you have to write a query to change it in the database).    If you don’t already know it, you can look up the SRID on the EPSG or Spatial Reference web sites.  More on identifying and updating coordinate reference systems on this supplemental post: Spatial SQL - Identify and Update Coordinate Reference Systems

With OSGeo4W and OGR

Importing data that is already in an appropriate coordinate system

All you need to do is type in a "simple" command on the command line.  Use ogr2ogr to connect to your database, enter the right credentials, reference your shapefile location and coordinate reference system...

 

 

ogr2ogr -f "PostgreSQL" PG:"dbname=<your database> host=localhost port=5432 user=<your user name> password=<your password>" "c:\temp\myShapefile.shp" -a_srs EPSG:4326

 

The geometry will be in a column named ‘wkb_geometry.’  This is different than the name for the column if you use the GUI PostGIS 2.0 Shapefile Import/Export Manager.  If that is important to you, you can rename it by adding:

-lco geometry_name=<name>

 

Importing data that needs to be re-projected first – OGR is great for this!

ogr2ogr -f "PostgreSQL" PG:"dbname=<your database> host=localhost port=5432 user=<your user name> password=<your password>" "c:\temp\myShapefile.shp" -s_srs EPSG:26911 -t_srs EPSG:4326

 

For example, if I were importing the Animal Shelters shapefile that was NOT in one of the coordinate systems that Tableau likes to read natively from PostgreSQL I would type in this to import the shapefile and change the coordinate reference system at the same time (the CRS updating is in bold below):

ogr2ogr -f "PostgreSQL" PG:"dbname=<your database> host=localhost port=5432 user=<your user name> password=<your password>" "c:\temp\AnimalShelters_.shp" -s_srs EPSG:102707 -t_srs EPSG:4326

 

 

Adding a GEOGRPAHY column

You don’t really need to create a geography column, since Tableau will work with both GEOMETRY and GEOGRAPHY from PostgreSQL, but if you want GEOGRAPHY it’s a simple query in PostgreSQL:

 

Alter table myTable

Add column geog geography;

 

Update myTable

Set geog = ogr_geometry::geography

-- I used the field name ‘ogr_geometry’ but use whatever your geometry field is actually named

 

Transforming data in PostgreSQL (if needed)

If you need to do a transformation once the data is in PostgreSQL, you can easily update the SRID to one of the supported CRS.  The important thing to note in the examples below is that we’re explicitly defining the geometry type (e.g., POINT) and the CRS for the new column.  This will help ensure that Tableau can properly recognize the geometry in the new column.

 

Alter table myTable

Alter column geom

Type Geometry(Point, 4326)

Using ST_Transform(geom, 4326);

 

If you don’t want to alter the original CRS, you can add a new column and use ST_Transform and just have multiple geometry columns!

 

Alter table myTable

Add column geom_4326 geometry(point, 4326);

 

Update myTable

Set geom_4326 = st_transform(geom, 4326);

 

You could even do this in Tableau with Initial SQL, Custom SQL, or RawSQL if you really need to (i.e., you can’t alter data in the database that you are accessing), but you’ll likely take a performance hit.  If you have to go this route, I recommend Initial SQL since that will just run once when you load the workbook.

Or:

------------

 

Spatial data into SQL Server

Assumptions:

I’m assuming that you already have a spatially enabled database to work with. If not, make one.

 

SQL Server (via Microsoft SQL Server Management Studio)

 

 

Import data with OSGeo4W and OGR

Tableau likes data from SQL Server to be of a GEOGRAPHY data type (latitude and longitude so that it is analyzed as if it’s on the surface of the earth, as opposed to data being in a GEOMETRY data type where the data is analyzed as if it’s projected onto a flat surface). 

 

You can only convert GEOMETRY to GEOGRAPHY if the coordinate system is already latitude and longitude.

 

SQL Server doesn’t have functionality for easily transforming the projection of the data, so you’ll either want to do that in advance, (See the section on Spatial SQL - Identify and Update Coordinate Reference Systems to learn more about tranforming data into an appropriate geographic coordinate system (e.g., NAD83, WGS84, ETRS89) or you can do the transformation at the same time as you import the data into SQL Server with OGR

 

For fun, try this with a dataset in a projected coordinate system – I’ve attached one to this post (see the AnimalShelters_102707 shapefile)

 

When you try to create the GEOGRAPHY field you’ll get an error like this:

 

Msg 6522, Level 16, State 1, Line 2

A .NET Framework error occurred during execution of user-defined routine or aggregate "geography": System.FormatException: 24201: Latitude values must be between -90 and 90 degrees….

 

If you try to open the GEOMETRY field in Tableau, you’ll get an error like this:

 

[Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Operand type clash: geometry is incompatible with geography

 

 

Importing data that is already in an appropriate coordinate system

 

ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLEXPRESS;Database=tc19;Trusted_Connection=yes" myShapefile.shp -a_srs EPSG:4326 -unsetFieldWidth

 

With SQL Server, I’ve had a reasonable number of datasets that give me arithmetic overflow errors related to converting numeric to data type numeric. This seems to be an issue with long numeric identifiers, so I often add the flag to -unsetFieldWidth  You don’t always need that flag to import

 

Importing data that needs to be re-projected first – OGR is great for this!

ogr2ogr -f "MSSQLSpatial" "MSSQL:server=localhost\SQLEXPRESS01;Database=tc19;Trusted_Connection=yes" myShapefile.shp -s_srs EPSG:102707 -t_srs EPSG:4326 -unsetFieldWidth

 

 

Make a field of GEOGRAPHY type so that we can visualize in Tableau.

 

The query below simply takes a data table (from an imported shapefile) and adds a new column called “ogr_geography” that is a GEOGRAPHY data type

 

alter table tableName

add ogr_geography geography;

 

If you’re working with POLYGON data and your original data source was a shapefile or something that has similar vertex ordering semantics, you’ll need to re-order the vertices (more information on this problem here and information about the process of re-ordering the vertices here)

 

update tableName

set ogr_geography = ogr_geometry.MakeValid().STUnion(ogr_geometry.MakeValid().STStartPoint()).MakeValid().STAsText()

 

If you’re working with POINT, LINE, or data that already has a semantic that matches the vertex ordering in SQL Server, you can just create your GEOGRAPHY column like this:

alter table tableName

add ogr_geography geography

 

GO

 

update tableName

set ogr_geography = ogr_geometry.MakeValid().STAsText()

 

How would you know if you have a vertex ordering problem? (because it’s not like most people enjoy thinking about the winding order of the vertices in their geographic polygons!) I check this by viewing the GEOGRAPHY in Tableau and checking if 1) it’s crazy slow, 2) a spatial intersection shows inverted results (e.g., the points identified as ‘inside’ the polygons are outside), or 3) the bounding box for the data is ALWAYS global scale (i.e., if you filter to just one polygon you still default to a global-scale zoom)

 

You should now have a GEOGRAPHY type column in SQL Server that is ready to work with in Tableau.

This discussion supports the blog post series on working with data in Tableau via a database that supports spatial analysis (e.g., PostgreSQL / PostGIS or SQL Server).  It provides background for identifying and updating the coordinate system in your spatial data.

 

These instructions are only going to be useful if you already have a CRS defined in your dataset.  If you don’t have a CRS defined and you don’t know what it is, that is an entirely different (and sometimes super challenging) problem.

 

Tableau currently supports three coordinate systems for data in spatial databases:

  • NAD83 (EPSG 4269)
  • ETRS89 (EPSG 4258)
  • WGS84 (EPSG 4326)

 

To work with your data directly in Tableau, you will need to transform your spatial data into one of these three coordinate systems.  You can do this in advance (before your data is loaded into the database), during the process of loading the data into the database (using ogr2ogr), or afterwards by altering the coordinate system of the data in the database (in PostgreSQL only).

 

When you add spatial data to a database, you should always define the coordinate reference system (CRS) on import.  You can do it afterwards, but there is high likelihood of forgetting and then losing a bunch of time trying to figure out why none of your queries are working or why Tableau won’t show your data.  So – just make it a practice to define the CRS when you initially drop the data into the database.

 

You might already know the CRS because all of the data from your source is exactly the same.  In that case, just move forward with importing your data.  But…if you don’t already know the CRS, or you want to check and be sure, here is what I do:

  • Check if there is a .prj file with your shapefile. This defines the projection, or CRS, for the file.  Even though the extension is .prj, the file is just a plain text file.  Open it with your favorite text editor (Notepad, Sublime, etc.)
  • Take a look at the first part of the text in the .prj – that should be your coordinate system.  For instance, this text indicates the dataset uses WGS84 (in bold below):

 

GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]

 

  • Figure out the spatial reference ID (SRID) for the CRS.  I generally use the search tool in spatialreference.org  or epsg.io for this.  You’ll want to use the EPSG code for defining your SRID.   Here are some common SRIDs so you don’t have to look them up:
    • WGS84 (World geodetic survey of 1984): SRID 4326
    • NAD83 (North American Datum of 1983): SRID 4269
    • ETRS89 (European Terrestrial Reference System of 1989): SRID 4258

 

How do you change the coordinate reference system for your data?

As mentioned above, Tableau supports three coordinate systems for data in spatial databases (yes, I'm saying this same thing over and over again...write it down on a sticky note and put it somewhere visible...if you work with spatial databases and Tableau you'll want to burn these numbers into your brain):

NAD83 (North American Datum 1983)

ETRS89 (European Terrestrial Reference System 1989)

WGS84 (World Geodetic Survey 1984)

 

If you want your data to just work without having to do any serious data transformations in the database, transform your data into one of these coordinate systems before importing the data.

 

If you need to change your CRS, there are two ways that I generally do this:

 

QGIS (slower, but easy and uses a nice GUI)

Open your shapefile in QGIS.

Right click on the shapefile name in the Layers list, and select Export -> Save Features As…

 

Save the file with a new name, and select the CRS that you want to use for the output file. You can pick a new CRS using the little globe icon next to the CRS drop down in the Save Vector Layer as… dialog:

 

 

OGR in OSGeo4W

This is super fast and is done in OSGeo4W so you can quickly import the data into your database in one step (update CRS and import at the same time), or you can just do the transformation using the commands below and then import later.

 

Update CRS only:

 

ogr2ogr -f "ESRI Shapefile" destination_name.shp source_name.shp
-s_srs EPSG:<input crs> -t_srs EPSG:<output crs>

 

for instance:

ogr2ogr -f "ESRI Shapefile" myShapefile_4326.shp myShapefile.shp
-s_srs EPSG:26913 -t_srs EPSG:4326

 

Update CRS and put into PostgreSQL at the same time:

 

ogr2ogr -f "PostgreSQL" PG:"dbname=yourDBName host=localhost port=5432 user=userName password=*******" source_name.shp -s_srs EPSG: <input crs> -t_srs EPSG: <output crs>

 

for instance:

ogr2ogr -f "PostgreSQL" PG:"dbname=<your database> host=localhost port=5432 user=<your user name> password=<your password>" "c:\temp\myShapefile.shp" -s_srs EPSG:26911 -t_srs EPSG:4326

A common spatial question relates to finding the ‘nearest’ {customer, factory, patient, etc.}.  This type of distance-based question is often solved with a k-nearest neighbors (KNN) approach.   KNN allow us to look for an arbitrary number of points (K) near to any other given point location.

 

While there are a number of more elegant ways to tackle the narrowing of result sets for this type of problem with SQL, I'm going to take the approach of calculating lots of results and then using the functionality of Tableau to help streamline for visualization and analysis.

 

Data:

  • Las Vegas Fire Stations (22 points)
  • Las Vegas Fire Incidents (17,605 points)
    This is based on an incident dataset from the Las Vegas open data portal.  Since the raw incident data from the open Las Vegas data portal was already aggregated to the block level, I’ve further simplified the individual rows in the original dataset that I downloaded so that it just has ONE point location for each block with data and a COUNT for how many points were recorded at that location.

 

In Tableau, you’ll insert queries to calculate relationships like KNN into Custom SQL or Initial SQL and use the result from that query as your data source, or as a data source that you join to another one to create your visualizations.

 

In this post, we’ll explore queries in Tableau to going from one or more points of origin to ALL possible targets and then filtering in Tableau (e.g., from this one location where there was an incident to all fire stations).

 

There are numerous ways to tackle this problem.  In this post I’m going to explore BROAD queries that give us lots of data and allow for simplification and filtering in Tableau.  There are other ways to query that will return more targeted sets of results (e.g., the query only returns the top 3 closest neighbors), but for sake of simplicity, we’ll write queries that return distance to ALL of the possible locations and then we’ll filter to top 3 or top K nearest neighbors in Tableau.

 

Distance from one or more locations to the nearest K targets

An easy way to think about this problem is to calculate the distance from one point to every point in the target dataset.  In this case, that means finding the distance to every fire station from ONE point in the incidents dataset.

 

To make it easy to look at the results, we’ll order by distance so we can quickly identify the station with the shortest distance – like this table showing the distance from a selected incident to every fire station:

 

 

The query for this is easy:

 

-- select out ONE point to use for the demo    

WITH oneIncident AS(

    SELECT

        *

    FROM las_vegas_fire_incidents

    LIMIT 1

)

-- now run the query to find the distance *from* that one incident location

-- to each fire station in that data table

SELECT

    ST_Distance(s.geom::geography, i.geom::geography) as distance,

    s.facilityname,

    i.rownum

FROM

    las_vegas_fire_stations as s, oneIncident as i

 

We can put this in either Custom SQL or Initial SQL.

 

Now we can use filters in Tableau to limit to the top K closest fire stations.  In Tableau it would be easy to do this from our visualization:

 

 

Right click on the facility name pill and select Filter, then we’ll use a filter to only return the top K using a parameter to define the number of top results (or bottom results, if you want to just keep the smallest values / closest distances). In the example below, we’re simply returning ALL distances to the target fire stations, then filtering to just show the closest four.  We can update to show the closest one, two, twenty, whatever we want by just changing the value for the parameter, K.

 

 

BUT, this is just doing a calculation of nearest stations to ONE single incident.  What if we need the selected origin to be dynamic?  I can think of two options:

 

1) select the origin location with a parameter using a field in the origin location table.  By pairing this up with a map showing all of the possible origin locations, you should also be able to pair this up with a sweet parameter action and have a dynamic method for selecting origin location.

 

 

The query for this would look very similar to the one above, with one minor change:

  • Add a parameter to identify the ONE incident used (in bold below)

 

-- select out ONE point to use for the demo    

WITH oneIncident AS(

    SELECT

        *

    FROM las_vegas_fire_incidents

    WHERE target = <Parameters.origin target>

)

-- now run the query to find the distance *from* that one incident location

-- to each fire station in that data table

SELECT

    ST_Distance(s.geom::geography, i.geom::geography) as distance,

    s.facilityname,

    i.rownum

FROM

    las_vegas_fire_stations as s, oneIncident as i

 

 

2) Another option is to calculate the distances from all possible origins to all possible destinations.   If you choose to do this, it’s a good Initial SQL calculation so that it only runs once when you load the workbook.  Keep in mind that with large datasets you’re going to increase the number of calculations pretty quickly.  For the datasets that I used, the result was 17,605 origin locations times 22 destination locations =  387,310 calculations.  That’s not huge, and it runs pretty quickly…but my example files are relatively small datasets.

 

-- now run the query to find the distance from EVERY incident

-- to EVERY fire station in that data table

SELECT

ST_Distance(s.geom::geography, i.geom::geography) as distance,

    s.facilityname,

    i.target

FROM

    las_vegas_fire_stations as s, las_vegas_fire_incidents as i

 

With the distance between every possible pair of origins and destinations you can filter quickly by origin, look at average distances from every origin to every destination, or do other calculations that seem interesting to you…

 

It might help with querying / filtering to add in a rank of distance between each origin and destination.  You can add that with a small change to the query to calculate a rank for each origin location:

 

-- now run the query to find the distance *from* that one incident location

-- to each fire station in that data table

SELECT

ST_Distance(s.geom::geography, i.geom::geography) as distance,

    s.facilityname,

    i.id,

    i.geom as geom_incident,

    s.geom as geom_station,

    rank() over (partition by id order by st_distance(s.geom::geometry, i.geom::geography)) as rank

FROM

    las_vegas_fire_stations as s, las_vegas_fire_incidents as i

It’s all well and good to do calculations on geography within one table of data, but a lot of the fun analysis is the result of combinations of data from multiple tables – like distance from customers to stores, or intersection of hurricane force wind polygons with Census tracts, etc.  So, let’s explore how we can use PostgreSQL to combine and compare geographic data across tables.

 

Good things to keep in mind:

 

  • You’re often combining tables. Think carefully about how many rows are being compared in your query.  Every row against every row?  That adds up (actually, multiplies!) quickly.

 

  • The more spatial calculations you do, the slower the analysis.  Spatial indexes will help minimize unnecessary spatial calculations.

 

    • For use in Tableau, if the calculation doesn’t have to be dynamic, do it as a pre-processing step.  That will speed things up when you want to introduce interactivity in Tableau.   If the calculation does have to be dynamic, know that you’ll get a performance hit by having to pass the query back and forth a bunch to PostgreSQL, so you’ll want to be cognizant of the size and complexity of your datasets.

     

    When working with complex analyses that need to be done in Tableau, I suggest sticking with Initial SQL if possible, and not relying on custom SQL or RawSQL, if possible. That makes it so that the result becomes a new data source and the query isn’t re-run every time you update a viz. It’s a bit less ‘chatty’ than Custom SQL or RawSQL and you can use more complex queries more easily.  The query will update when you open the workbook, but otherwise will just result in a static data source.

     

    There are some limitations of Initial SQL, though – for instance, you can’t dynamically adjust the query with parameters on your worksheets.  You *can* do that with Custom SQL and RawSQL, however.

     

    What sort of things can we do?

    In this post we’ll walk through examples for a few common analysis functions and break them down based on spatial question...  To keep the examples simple I'm really going to just work from ONE table of data, but will create a second table using a subquery.  For instance, most of the examples will rely on a US Counties dataset and a subquery that contains a single county or a buffer around a county that we will use as our 'second table.'  If you want to try similar queries with a similar dataset, you can download some good cartographic boundary files from the US Census.

     

     

    Where are the nearest x things...

    What are my neighbors?  A common question is ‘what are the K closest points to another point of interest?’ For instance – where is the closest coffee shop?  Or, what are the three closest distribution facilities? What are the closest fire stations to each reported incident?  We can answer this type of K Nearest Neighbor (KNN) problems easily with SQL.    But, this is a big enough type of question that I’ve written a few separate posts on just this topic!

     

     

    These posts mostly discuss finding the nearest neighbors with Euclidean distance.   If you want to find the nearest point along a road network, you can use PGRouting in PostgreSQL.  Here are some additional posts discussing those (and related) problems:

     

    What is here / near / overlapping / inside...

     

    While questions about distance between points is super useful, there are additional interesting proximity-based questions that we can explore with polygon data.  Since polygons are two-dimensional we can look at spatial questions that involve containment (like inside or outside), adjacency (touching), as well as questions related to distance.

     

    If you want to get super-geeky talking about spatial relationships, take a look at the DE-9IM (Dimensionally Extended 9-Intersection Model). The model can be used to classify possible spatial intersections.

     

    I’m not going to get super geeky and will try my best to just cover the basic interesting spatial relationships and how you can use them.

     

    Here are a number of good spatial operations you can consider for your analytic workflow.  All of these spatial operations will require multiple geographies as input, and most can work with points, lines, or polygons as input (though remember that for anything involving overlap, you need an AREA to overlap with, so you'll want a polygon to define the overlap area).  Most of the time the geographies in these queries are going to come from different tables in your database and you’ll be looking for relationships between those tables.

     

    To make things easy for these examples, I'm just going to use geography from one dataset, but I'm going to create a little temp table with a subset of the geography so that it acts like two tables being compared.

     

    Evaluating adjacency

    ST_Touches (https://postgis.net/docs/ST_Touches.html)

    (GEOMETRY)

    Returns a BOOLEAN indicating whether the GEOMETRY touch

     

    ST_Touches can be used to look at relationships between polygons, lines, and points except for point-to-point relationships.  For this to be true, the only points in common between the features must be on a boundary (e.g., polygons that share an edge, but do not overlap at all).

     

    In this example, we'll use ST_Touches to find all of the neighbors for a specific polygon of interest.  For instance, if we want to compare attributes for a county to all of its directly adjacent neighbors. The example below compares area of a selected county (Powder River County) to all of its neighbors.  The result tells me that it is smaller (land area) than all but two of its neighbors.

     

    To run this query, I'm creating a temporary selection of ONE county using a subquery that returns a single polygon based on the geographic ID (that is the 'With' statement in the query below)

     

    A query to identify touching polygons (and to label the results as being the ‘selected polygon’ or a ‘neighboring polygon’ would look like this:

     

    -- select a single polygon as our target location

    -- This is particularly nice when paired with a parameter action in Tableau

    -- to allow a user to just click on a polygon to update the parameter

    with selected as (

        select wkb_geometry as selected_geom   

        from tl_2017_us_county

        where geoid = <Parameters.selected county>

    )

    -- now find everything that touches it

    select *,

    -- case statement so we can label the counties as

    -- selected, neighbor, or not neighbor

        case

            when geoid = <Parameters.selected county>

                then 'Selected'

           when ST_Touches(selected.selected_geom, county.wkb_geometry) then 'Neighbor'

           else 'Not Neighbor'

        End as neighbor_type

           

    from tl_2017_us_county as county, selected

     

    That would let us build up a visualization like this in Tableau - where we have a selected county and its neighbors highlighted, and a bar chart showing the area of the selected county compared to the neighbors.  Color encoding and the selection of counties to include in the bar chart are based on the 'neighbor_type' data returned in the query above (that's what we get from the case statement)

     

    Evaluating overlap

    ST_Intersects (https://postgis.net/docs/ST_Intersects.html )

    (GEOMETRY or GEOGRAPHY)

    Returns a BOOLEAN indicating whether the GEOMETRY or GEOGRAPHY share any portion of space.

     

    ST_Contains (https://postgis.net/docs/ST_Contains.html )

    (GEOMETRY)

    Returns a BOOLEAN indicating whether the GEOMETRY of B is completely inside the GEOMETRY of A

     

    ST_Intersects and ST_Contains give us two ways to look at whether or not two polygons, or a polygon and point or lines share space.  With ST_Intersects, the geometries just have to overlap.  With ST_Contains, the geometry for one feature has to be completely inside the geometry of the reference polygon.

     

    Here is an example of the difference (ST_Intersects on the left, ST_Contains on the right).  These examples find all of the polygons that intersect or are completely contained by a 150km buffer drawn around a selected county.

     

     

    The queries for these look very similar to the one that we used to find all of the neighbors that TOUCH - but our selected polygon is the BUFFER around the selected county.  See the bold text below for the differences:

     

    INTERSECTS

    -- select a single polygon as our target location

    -- This is particularly nice when paired with a parameter action in Tableau

    -- to allow a user to just click on a polygon to update the parameter

    with selected as (

        select ST_Buffer(wkb_geometry, 150000) as buffer_geom 

        from tl_2017_us_county

        where geoid = <Parameters.selected county>

    )

    -- now find everything that intersects with it

    select *,

    -- case statement so we can label the counties as

    -- selected, neighbor, or not neighbor

        case

            when geoid = <Parameters.selected county>

                then 'Selected'

           when ST_Intersects(selected.buffer_geom, county.wkb_geometry) then 'Neighbor'

           else 'Not Neighbor'

        End as neighbor_type

          

    from tl_2017_us_county as county, selected

     

     

    CONTAINS

    -- select a single polygon as our target location

    -- This is particularly nice when paired with a parameter action in Tableau

    -- to allow a user to just click on a polygon to update the parameter

    with selected as (

        select ST_Buffer(wkb_geometry, 150000) as buffer_geom 

        from tl_2017_us_county

        where geoid = <Parameters.selected county>

    )

    -- now find everything that intersects with it

    select *,

    -- case statement so we can label the counties as

    -- selected, neighbor, or not neighbor

        case

            when geoid = <Parameters.selected county>

                then 'Selected'

           when ST_Contains(selected.buffer_geom, county.wkb_geometry) then 'Neighbor'

           else 'Not Neighbor'

        End as neighbor_type

          

    from tl_2017_us_county as county, selected

     

     

    Creating new geography based on overlap

     

    What if we want to create new geography based on the overlap or intersection of geographic features?  In other words, what if we want to clip roads based on an outline or separate polygons based on an intersection?  We can use ST_Intersection for that...

     

    Here is an example in SQL Server that I wrote up a while back if you want more use cases:

     

     

    Here is what the intersection would look like in Tableau - notice that Park County is now split into two polygons...one red polygon that intersects our buffer, and one grey that is outside the buffer.

     

    And here is an example of what the query would look like in PostgreSQL with the counties dataset from other examples in this post - it looks a little more complicated because Tableau only allows use of a single geometry type at a time - and the intersection can return polygons and multipolygons (so we force everything to a multipolygon with ST_Multi), and because we have to reinforce the geometry type and coordinate system for the resulting geometry (that is the ::geometry(multipolygon, 4269) part of the query), but really it's not that difficult:

     

    with selected as (

        select ST_Buffer(wkb_geometry, 150000) as buffer_geom

        from tl_2017_us_county

        where geoid = <Parameters.selected county>

    )

    select

         counties.*,

         ST_Multi(ST_Intersection(counties.wkb_geometry, selected.buffer_geom))::geometry(multipolygon, 4269) as Neighbor,

         selected.buffer_geom as geom_buffer

    from

         tl_2017_us_county as counties, selected

    where ST_Intersects(selected.buffer_geom, counties.wkb_geometry)

     

     

    By calculating out the intersecting areas, we can then do other fun calculations like proportionally allocating attributes to the two segments of the polygon.  As an example - if we were looking at the area impacted by hurricane-force winds and wanted to know how many people were impacted, we might over-estimate if we just used the entire population for an overlapping county. BUT, if we know that only 20% of the county intersects the hurricane force wind zone, then we can make a new calculation and just use 20% of the population for that county.  It's easy to do that by comparing the area of the intersection with the area of the entire county.

     

     

     

    That is just a few options of how you can combine geographies to make selections.  There are plenty of other interesting operators to look into and I encourage you to explore the options!

    When working with spatial databases, it often helps to create a spatial index for your datasets.  The spatial index makes it more efficient to process spatial queries by helping narrow down the number of features that have to be tested against.  For instance, if you want to know which polygon a point falls into you can check for intersection with EVERY possible polygon, or you can use a spatial index to limit the number of point in polygon operations you have to do.  A common spatial index is based on the bounding box of the polygon.  Checking if a point is within the bounding box is much quicker than testing for spatial intersection with a complex polygon.  By limiting to just the polygons where the point is inside the bounding box, we can limit our need to do the ‘more expensive’ calculation and just calculate against the smaller set of likely polygons.

     

    More information on spatial indexes:

    https://en.wikipedia.org/wiki/Spatial_database#Spatial_index

    https://postgis.net/workshops/postgis-intro/indexing.html

     

    How do you create a spatial index?

    Easy.  Here is an example of creating an index on a table:

    create index <table name>_geom_idx

    on <table name>

    using GIST(<geometry field>)

     

     

    Note that some of the PostGIS functionality will use a spatial index by default (e.g., ST_Contains).  That is super cool.  Also note that you can easily force it to ignore the spatial index by using the _ST_Contains function instead.

     

     

    Test calculations

    Now, let’s do a test to see how much of a difference the spatial index makes!

    Test data:

    • Points – a dataset of 1000 random point locations, all within the conterminous US

     

    (gratuitous ugly map of random point data!)

     

    • Polygons – three datasets of different sizes covering entire US (including AK and HI).  The datasets are from the US Census cartographic boundary files
    • US States (51 polygons) – includes DC
    • US Counties (3233 polygons)
    • US Census tracts (74002 polygons)

     

     

    How long does it take for a query to identify the polygon that each point falls inside using the following query? 

    select point.*, poly.geoid

    from random1000 as point

    join tl_2018_us_state as poly

    on st_contains(poly.geom, point.geom)

     

     

     

     

    Or, if you want the actual numbers:

     

     

     

     

    State

     

    County

     

    Census Tract

     

    Without spatial index

     

    3250.772

     

    33319.82

     

    611127.95

     

    With spatial index

     

    859.462

     

    788.091

     

    1120.448

     

    For reference, the ST_Contains() function uses a spatial index by default, but you can force it to ignore the index by using _ST_Contains() instead (note the underscore ( _ ) before the ST_Contains in the non-indexed query.  That is what I used in the tests above. 

     

    Seems worthwhile to use a spatial index!

    In the ‘getting data into a spatial database’ post I included instructions for both PostgreSQL and SQL Server.  At this point, however, I’m just going to focus on queries in PostgreSQL / PostGIS to keep things simple.

     

    Note that much of the functionality that I discuss is also available in SQL Server, though the names of the functions, and the structure of the queries, are likely to be a bit different. Just Google it.

     

    This post looks at a handful of common calculations that can be done on individual geographic features. You can use these in Initial SQL, Custom SQL, or RAWSQL queries in Tableau as part of your analytic process.

     

    Helpful information about each of these queries in Tableau:

     

    For demonstration purposes I’m going to use these datasets (attached as a zip file...and the coordinate reference system ID is in the file name when you unzip the attachment.  Hopefully that will make it easier for you to get the data into your database to play with it in Tableau!):

    • Fire stations (points)
    • Nevada Department of Transportation districts (polygons)

     

    ST_Buffer

    (https://postgis.net/docs/ST_Buffer.html)

    (GEOMETRY or GEOGRAPHY)

     

    What if we want to identify the area within 1km of any fire station?  We can create a buffer:

     

    ST_Buffer(geometry or geography, distance)

     

    In Tableau as a RAWSQL calculation:

     

     

    RAWSQL_SPATIAL(“ST_Buffer(%1::geography, 1000)”, [Geom]

     

     

    • ST_Buffer is the function in PostGIS
    • %1 is a place-holder for a field or parameter name used in the query.  We add in the actual field name at the end of the RAWSQL calculation
    • %1::geography is how we “cast” a geometry data type to a geography.  This makes it so that the buffer is calculated with a spherical coordinate system and uses meters for the input distance. Otherwise, the calculation is done in a planar coordinate system.  Since Tableau only works directly with spatial database data values that are already in a spherical coordinate system, if you are working with a geometry data type and you don’t cast it to a geography, your calculations will be done with units of decimal degrees.  Just don’t do that.  It’s a bad idea.
    • 1000 is the distance to use in the buffer.  ST_Buffer with geography data types use meters as the unit of measure. If you want to use something other than meters, do a conversion as part of the query (e.g., if you want to use feet, the query would look like this:  ST_Buffer(%1::geography, 1000 * 3.28084)
    • [Geom] is the name of the geometry field in our database as it appears in Tableau

     

    Here's how the query looks in the calculated field editor:

     

    In this case we’re casting our GEOMETRY type data (the geom field) into a GEOGRAPHY type so that we calculate the buffer using meters on the sphere.

     

    Because this is a calculated field, we can swap out the hard-coded buffer distance with a parameter and have an adjustable distance buffer:

     

     

    What happens if we work with GEOMETRY instead of GEOGRAPHY?  Queries with GEOMETRY are calculated using planar coordinates.  Recall that Tableau likes data in latitude and longitude, so the GEOMETRY calculations will be done using decimal degrees as the unit.  Not a good choice…especially when it comes to things like area calculations (how big is a square decimal degree???)

     

    Ugh.  And I won’t even get into the projection issues with this.  We can talk later.

     

     

     

    ST_Area

    (https://postgis.net/docs/ST_Area.html)

                    (GEOMETRY or GEOGRAPHY)

    What if we want the area of these (or any other) polygons?

     

    It’s another easy query, but since this will return a numeric value instead of a geometry, we’ll use a RAWSQL_REAL() calculation:

     

     

    RAWSQL_REAL (“ST_Area(%1)”, [Geom])

     

     

     

    Since the buffer that we created is a GEOGRAPHY type (because we cast the GEOMETRY to GEOGRAPHY as part of the process), the results will be returned in square meters, so we might want to convert this back to KM to make it a nicer number to analyze…

     

     

     

    ST_Boundary

    (https://postgis.net/docs/ST_Boundary.html)

    (GEOMETRY)

    What if we want the outline of a feature?  ST_Boundary will return the outline.  Note that it only works with GEOMETRY type, and not GEOGRAPHY.   This could be a nice way to customize border color for polygons by using a dual axis map to combine it with polygon layer.

     

     

    RAWSQL_SPATIAL(“ST_Boundary(%1)”, [Geom])

     

     

     

    ST_Centroid

    (https://postgis.net/docs/ST_Centroid.html)

                    (GEOMETRY or GEOGRAPHY)

    Maybe we only need the centroid for a polygon?  That’s easy with ST_Centroid:

     

     

    RAWSQL_SPATIAL(“ST_Centroid(%1)”, [Geom])

     

     

     

    ST_Envelope

    (https://postgis.net/docs/ST_Envelope.html)

                    (GEOMETRY)

    Maybe you need the bounding box?  Here are the bounding boxes for the buffers around the fire stations:

     

     

    RAWSQL_SPATIAL(“ST_Envelope(%1)”, [Geom])

     

     

     

     

    ST_Perimeter

    (https://postgis.net/docs/ST_Perimeter.html)

                    (GEOMETRY or GEOGRAPHY)

    Or maybe we want the perimeter of a polygon?  Use ST_Perimeter() on a polygon.  This will work with either GEOMETRY or GEOGRAPHY.  Calculations on GEOGRAPHY will be done on the sphere and return values in meters, while calculations on GEOMETRY will be in the coordinate system used for the GEOMETRY...in this case the units are decimal degrees.  The perimeter in decimal degrees makes no sense.  Don’t do that.  I only calculated it to show how silly it is.

     

     

    RAWSQL_SPATIAL(“ST_Perimeter(%1)”, [Geom_Buffer])

     

     

     

     

    ST_X & ST_Y

    (https://postgis.net/docs/ST_X.html and https://postgis.net/docs/ST_Y.html)

    (GEOMETRY)

    How about if we need the X and Y coordinates for a geographic feature? Easy:

     

     

    RAWSQL_SPATIAL(“ST_X(%1)”, [Geom])

     

     

     

     

    ST_ConvexHull

    (https://postgis.net/docs/ST_ConvexHull.html)

    (GEOMETRY)

    Minimum convex geometry to enclose all of the input geometries

    Most of the other calculations covered in this post have been row-level. For a Convex or Concave Hull you often want to work with collections of geometry (e.g., the convex hull around all of the middle schools in an area) as opposed to at the row level (a convex hull around each individual point...which isn’t interesting, because the convex hull around a point is a point).

    Because of this, unless we are already working with a multipoint, multipolygon, etc. we will need to use an aggregated RawSQL query and then define our level of detail / aggregation in the viz:

    For instance, to generate the convex hull around ALL of the points (“collect” these together using ST_Collect first) we can use the aggreagted RawSQL_Spatial function like this:

     

     

    RAWSQL_SPATIAL(“ST_ConvexHull(ST_Collect(%1))”, [Geom])

     

     

     

    If we wanted to return multiple convex hulls around different groupings of points, we can just adjust the level of detail on the viz like this:

     

     

    ST_ConcaveHull

    (https://postgis.net/docs/ST_ConcaveHull.html)

    (GEOMETRY)

    Concave geometry enclosing all of the input geometries.  Can be  ‘shrink wrapped’ to fit the input geometries more closely by adjusting the ‘target_percent’ input, though note that tighter fits will be much slower to compute.

     

     

    RAWSQL_SPATIAL(“ST_ConcaveHull(ST_Collect(%1), 0.8)”, [Geom])

     

     

    ST_VoronoiPolygons

    (https://postgis.net/docs/ST_VoronoiPolygons.html)

    (GEOMETRY)

    The Voronoi Polygon function returns a Geometry collection of polygons.  As a RawSQL calculation, Tableau sees that returned geometry as one single feature with multiple polygons – like this, which might not be super helpful:

     

     

    RAWSQL_SPATIAL(“ST_VoronoiPolygons(ST_Collect(%1))”, [Geom])

     

     

     

    That’s okay if we just need a graphical representation of the polygon boundaries, but if we want to be able to interact with them individually we might want to either generate these as a pre-processing step (e.g., make the geometry and split out the individual polygons in a table in our database), or we can generate the polygons with custom SQL.  This is a bit more complex –

     

    • we have to group all of our points together (that’s the first part where we use ST_Collect to group all of the fire station points into a multipoint feature)
    • Then we make the Voronoi polygons using that multipoint feature and use ST_Dump to then split the multipolygon with all of the Voronoi polygons back into individual polygons
    • Then we make sure the polygons have appropriate attributes by joining the attributes from stations back in.  We know which polygon gets which attributes by using  ST_Intersects to match the polygon with one point location that the polygon was based on.

     

    WITH points AS (

           SELECT St_Collect(geom) as geom

           FROM   Las_vegas_fire_stations

    ),

    Voronoi AS (

    SELECT (St_Dump(St_VoronoiPolygons(points.geom))).geom::geometry(polygon, 4326) as geom_voronoi

    FROM points

    )

     

    SELECT voronoi.*, stations.*

    FROM voronoi

    JOIN las_vegas_fire_stations as stations

    ON St_Intersects(stations.geom, Voronoi.geom_voronoi)

     

     

     

    K-Nearest Neighbors (KNN) problems look for an arbitrary number of points (K) near to any other given point location.   In a prior post, I discussed identification and mapping of the linkages between a point and it’s K neighbors using straight line distances.  In this post I’m going to look at mapping the network path between the locations.

     

    Full disclaimer – I'm still using Euclidean distance to identify closest locations, but am adding in network paths to show the link between the locations.  I haven’t yet figured out an efficient way to do the KNN on network distances and have it be a reasonable process for anything other than small test datasets. 

     

    Many spatial calculations are kinda expensive to calculate, and calculating hundreds, thousands, or bajillions of network paths takes a lot of time.  To find closest on network is much more computationally expensive, and I think you might need to route to each and then rank the distances.  I think that would suck from a performance perspective (or at least with my limited SQL knowledge).  So, I’m going for some approximates in my explanation... your mileage may vary. 

     

    Data:

    I don't have datasets attached for you to work with, but the fire stations and incidents have been attached to other blog posts in this series and the OSM data can be downloaded for whatever your location of interest is using the instructions in another post: Dynamic Routing in Tableau (with a little PostgreSQL/PostGIS love)

    • Las Vegas Fire Stations
    • Las Vegas Fire Incidents
    • Road network from OSM

     

    General process that I used: Data prep SQL

    • Snap fire stations and incident locations to the OSM network (for easy use in pgr_dijkstra routing function)
    • Calculate K closest fire stations to each incident

     

    Tableau SQL

    • Use pgr_dijkstra to get route between one selected incident (selected using dynamic parameter) and the K nearest fire stations (as identified in data prep step above).  Return a table with K number of rows, each row having a multilinestring with the network route, and an appropriate set of IDs to join attributes from fire stations and incidents back in for visualization / analysis in Tableau

     

    Background:

     

     

    In my earlier post on dynamic routing, I discussed the basics of querying an OSM-based road network to return a single route between two locations.  In this post, we extend that to return multiple routes from a single location.

     

    The short story on what we need to do the routing:

    • A road network – I'm using one based on Open Street Map (OSM) data.  Directions on setting this up are in the Dynamic routing in Tableau with PostGIS and PGRouting blog post.  That post uses a different geographic location, but the process is the same to set up the routing data.
    • Origin location (point location) - we’ll take the lat/lon for the origin and snap it to our road network to use for routing.  The routing algorithm takes an ID on the road network for the starting and ending point.

     

    • Destination locations (point locations) - we’ll also snap these destination locations to the road network for use in the routing algorithm.
    • A way to bend the data and the queries to our will for use in Tableau.

     

    As a refresher (from my earlier blog posts on network routing using OSM data and PGRouting), here is a basic query to obtain a route between nodes 38251 and 576 on the OSM road network that I’m using (road network for Las Vegas, NV) - note that this is ONLY relying on the OSM road network and isn't adding in anything related to fire stations or fire incidents at this point:

     

    select

        seq, name, length_m, cost_s, the_geom

    from

    (select * from pgr_dijkstra(

    'select gid as id, source, target, length as cost from

    ways',38251, 576, false)

          ) as route, ways

    -- join the geometry back into the query result

    where

    route.edge = ways.gid

     

     

    In the query above, the origin node on the road network is 38521 and the destination node is 576.

     

     

    The pgr_dijkstra() function returns a table with all of the individual segments in the path between two locations.  Like this:

     

    If we wanted to make the path calculation dynamic, we could replace the origin and destination nodes in the query with parameters, like this...with values for the parameters coming from the Target field in the OSM Ways table:

     

    select

        seq, name, length_m, cost_s, the_geom

    from

    (select * from pgr_dijkstra(

    'select gid as id, source, target, length as cost from

    ways',<Parameters.Origin>, <Parameters.Destination>, false)

          ) as route, ways

    -- join the geometry back into the query result

    where

    route.edge = ways.gid

     

    This is great detail for a single path, but for showing multiple paths in Tableau (and to make it easy to join the attributes for the origin and destination back into the result) I’d like to simplify to just return the collected geometry, the origin name/identifier, and the destination name/identifier.

     

    In order to streamline, we’ll use ST_Collect() to aggregate all of these linestrings into a single multilinestring for each of our paths.  When we do that, we can also aggregate up the ‘cost’ associated with the path (sum of the length in meters and the sum of the ‘cost’ in seconds):

     

    select

        st_collect(the_geom)::geometry(multilinestring, 4326) as geom, sum(length_m) as length_m_sum, sum(cost_s) as cost_s_sum

    from (select * from pgr_dijkstra('

            select gid as id, source, target, length as cost from ways',38251, 576, false)

         ) as route, ways

        -- join the geometry back into the query result

        where route.edge = ways.gid

     

     

    This gives us a result like this:

     

    But...in the queries above, I’m using a hard coded value for the origin node and the destination node (whether from a parameter or by just typing in the value). I have to get those values for the origin and destination nodes in the OSM ways dataset from somewhere - because they aren't likely to already be in the point dataset that you want to use for routing!  Here we have choices - we can either pre-calculate them, or calculate them on the fly as were running the query in Tableau.

     

    To calculate them on the fly, I just add in a subquery to snap the geometry for the selected origin or destination and then use that in the pgr_dijkstra query.  So, in this part of the query (we'er going to replace the bold parts):

     

    select * from pgr_dijkstra('

            select gid as id, source, target, length as cost from ways',38251, 576, false)

     

    I’d replace both of the 38251 and  576 values with a subquery to retrieve the appropriate value for a node on the OSM road network.  For instance giving me a query like this where I’ve replaced the hard coded values with two selections based on an ID being passed from Tableau as a parameter (more on this in the (Dynamic routing in Tableau with PostGIS and PGRouting) blog post.

     

    select the_geom as geom_route, name, osm_id

    from (select * from pgr_dijkstra('

    select gid as id, source, target, length as cost from ways',

                       --source location (this was 38251 above)

                       (select source

                       from ways

                       order by ways.the_geom <-> (

                            -- select one node using parameter

                            SELECT geom FROM las_vegas_fire_incidents WHERE id = <Parameters.Origin Name>

                       )limit 1),

     

                   --destination location (this was 576 above)

                   (select source

                   from ways

                   order by ways.the_geom <-> (

                        -- select one node using parameter

                        SELECT geom FROM las_vegas_fire_incidents WHERE id = <Parameters.Destination Name>

                   )limit 1), false)) as route, ways

    -- join the geometry back into the query result

    where route.edge = ways.gid

     

     

    What is that doing?  Inside the sub queries for the origin and destinations we're replacing the hard-coded node ID with a query that returns the node ID based on the geographic location of a point of interest in our fire incidents table!  It's pretty cool - we're just dynamically identifying the origin and destination nodes on the road network based on proximity to our selected fire incidents.  And we're selecting our fire incident origin and destination locations using a parameter in Tableau.  That means we can use a parameter action to update the values just by clicking on our map!

     

    But, this gets complicated when I want to calculate paths to multiple destinations.  pgr_dijkstra can take multiple inputs for origin or destination as an array, but that means that I would need to do the subquery for each of my input locations and then squish those values into an array, and then run do the routing. Maybe that is an easy query for someone who is better at SQL than I am, but it wasn’t obvious to me how to do it easily, so I opted to just pre-process my data and attach the OSM node information directly into my input tables.  It was my ‘calculate it once and then never have to do it again’ option (and never have to think about what the SQL would look like to do it dynamically, because that was harder than I was willing to think about this problem.  When you have a good idea on how to do it, let me know and I’ll write the addendum to this post and give you full credit for your brilliant SQL! 

     

    Once I had the node from the OSM road network hard coded in each of my data tables I didn't need to look up the nearest node anymore, which simplifies my query quite a bit.  Since these values are unlikely to change (turns out roads tend to stay where they are...), and I would only need to update them if I update the road network, there really isn’t a lot of reason to calculate them dynamically. It also makes my resulting query much simpler when I want to pass in an array of destination locations and get back multiple linestrings (e.g., connect one fire incident location to the three closest fire stations).

     

    Now how do we get the road network paths to our K-nearest neighbors?

     

    I did this as a two part query - using Initial SQL I calculated out the three nearest fire stations to each of my fire incidents (this could be changed depending on what you want to use for K)

     

    create temporary table incidents3 as (

    SELECT

         s.geom as geom_station,

         i.geom as geom_incident,

         s.node_station as node_end,

         i.node_incident as node_start,

         s.facilityname,

         ST_Distance(s.geom::geography, i.geom::geography) AS distance,

         rank() over (partition by node_incident order by st_distance(s.geom::geography, i.geom::geography)) as ranked_distance

    FROM

         las_vegas_fire_incidents_nodes AS i

    CROSS JOIN LATERAL

    (SELECT

         facilityname, geom, node_station

    FROM

         fire_stations_nodes

    ORDER BY

         geom::geography <-> i.geom::geography

    LIMIT 3) AS s

    )

     

    Then in custom SQL I used a parameter to update my selected incident node, and then created a set of routes starting at that origin location and going to each of the three nearest fire stations (or use a parameter to select fewer than 3 nearest neighbors...you can select any number for K, so long as it is LESS than what you calculated in your initial sql):

     

    --- using a parameter to subset the table

    --- get the origin node, and an array of all of the destination nodes of interest

    with selection as (

        select min(node_start) as node_start, min(node_end) as node_end, array_agg(node_end) as dests

        from incidents3

        where node_start = <Parameters.incident_node>

    )

    --- now calculate the paths and collect all of the line segments for each one

    select

         min(end_vid) as destnode,

         st_collect(the_geom)::geometry(multilinestring, 4326) as geom_route,

        (select node_start from selection) as source_node -- should just be one

    from

         (select * from pgr_dijkstra('

              select gid as id, source, target, length as cost from ways',

                       --source location (find the source ID from ways (closest node to listed OSM ID))

                       (select node_start

                       from selection),

                        (select dests

                        from selection), false)) as route, ways

    -- join the geometry back into the query result

    where route.edge = ways.gid

    group by end_vid

     

    Now in Tableau I joined back in my fire stations and fire incidents so that I could see ALL of them, not just the one incident selected and the three fire stations closest to it

     

     

    And with a parameter action to update my selected incident node, I can click on any point:

     

    And have the new routes calculated dynamically

     

     


    And that's it.  I know it's not easy, but it is powerful and worth spending a bit of time pondering if you're interested in looking at adding dynamic routing and KNN calculations into your visual analytics.