A common spatial question relates to finding the ‘nearest’ {customer, factory, patient, etc.}.  This type of distance-based question is often solved with a k-nearest neighbors (KNN) approach.   KNN allow us to look for an arbitrary number of points (K) near to any other given point location.

 

While there are a number of more elegant ways to tackle the narrowing of result sets for this type of problem with SQL, I'm going to take the approach of calculating lots of results and then using the functionality of Tableau to help streamline for visualization and analysis.

 

Data:

  • Las Vegas Fire Stations (22 points)
  • Las Vegas Fire Incidents (17,605 points)
    This is based on an incident dataset from the Las Vegas open data portal.  Since the raw incident data from the open Las Vegas data portal was already aggregated to the block level, I’ve further simplified the individual rows in the original dataset that I downloaded so that it just has ONE point location for each block with data and a COUNT for how many points were recorded at that location.

 

In Tableau, you’ll insert queries to calculate relationships like KNN into Custom SQL or Initial SQL and use the result from that query as your data source, or as a data source that you join to another one to create your visualizations.

 

In this post, we’ll explore queries in Tableau to going from one or more points of origin to ALL possible targets and then filtering in Tableau (e.g., from this one location where there was an incident to all fire stations).

 

There are numerous ways to tackle this problem.  In this post I’m going to explore BROAD queries that give us lots of data and allow for simplification and filtering in Tableau.  There are other ways to query that will return more targeted sets of results (e.g., the query only returns the top 3 closest neighbors), but for sake of simplicity, we’ll write queries that return distance to ALL of the possible locations and then we’ll filter to top 3 or top K nearest neighbors in Tableau.

 

Distance from one or more locations to the nearest K targets

An easy way to think about this problem is to calculate the distance from one point to every point in the target dataset.  In this case, that means finding the distance to every fire station from ONE point in the incidents dataset.

 

To make it easy to look at the results, we’ll order by distance so we can quickly identify the station with the shortest distance – like this table showing the distance from a selected incident to every fire station:

 

 

The query for this is easy:

 

-- select out ONE point to use for the demo    

WITH oneIncident AS(

    SELECT

        *

    FROM las_vegas_fire_incidents

    LIMIT 1

)

-- now run the query to find the distance *from* that one incident location

-- to each fire station in that data table

SELECT

    ST_Distance(s.geom::geography, i.geom::geography) as distance,

    s.facilityname,

    i.rownum

FROM

    las_vegas_fire_stations as s, oneIncident as i

 

We can put this in either Custom SQL or Initial SQL.

 

Now we can use filters in Tableau to limit to the top K closest fire stations.  In Tableau it would be easy to do this from our visualization:

 

 

Right click on the facility name pill and select Filter, then we’ll use a filter to only return the top K using a parameter to define the number of top results (or bottom results, if you want to just keep the smallest values / closest distances). In the example below, we’re simply returning ALL distances to the target fire stations, then filtering to just show the closest four.  We can update to show the closest one, two, twenty, whatever we want by just changing the value for the parameter, K.

 

 

BUT, this is just doing a calculation of nearest stations to ONE single incident.  What if we need the selected origin to be dynamic?  I can think of two options:

 

1) select the origin location with a parameter using a field in the origin location table.  By pairing this up with a map showing all of the possible origin locations, you should also be able to pair this up with a sweet parameter action and have a dynamic method for selecting origin location.

 

 

The query for this would look very similar to the one above, with one minor change:

  • Add a parameter to identify the ONE incident used (in bold below)

 

-- select out ONE point to use for the demo    

WITH oneIncident AS(

    SELECT

        *

    FROM las_vegas_fire_incidents

    WHERE target = <Parameters.origin target>

)

-- now run the query to find the distance *from* that one incident location

-- to each fire station in that data table

SELECT

    ST_Distance(s.geom::geography, i.geom::geography) as distance,

    s.facilityname,

    i.rownum

FROM

    las_vegas_fire_stations as s, oneIncident as i

 

 

2) Another option is to calculate the distances from all possible origins to all possible destinations.   If you choose to do this, it’s a good Initial SQL calculation so that it only runs once when you load the workbook.  Keep in mind that with large datasets you’re going to increase the number of calculations pretty quickly.  For the datasets that I used, the result was 17,605 origin locations times 22 destination locations =  387,310 calculations.  That’s not huge, and it runs pretty quickly…but my example files are relatively small datasets.

 

-- now run the query to find the distance from EVERY incident

-- to EVERY fire station in that data table

SELECT

ST_Distance(s.geom::geography, i.geom::geography) as distance,

    s.facilityname,

    i.target

FROM

    las_vegas_fire_stations as s, las_vegas_fire_incidents as i

 

With the distance between every possible pair of origins and destinations you can filter quickly by origin, look at average distances from every origin to every destination, or do other calculations that seem interesting to you…

 

It might help with querying / filtering to add in a rank of distance between each origin and destination.  You can add that with a small change to the query to calculate a rank for each origin location:

 

-- now run the query to find the distance *from* that one incident location

-- to each fire station in that data table

SELECT

ST_Distance(s.geom::geography, i.geom::geography) as distance,

    s.facilityname,

    i.id,

    i.geom as geom_incident,

    s.geom as geom_station,

    rank() over (partition by id order by st_distance(s.geom::geometry, i.geom::geography)) as rank

FROM

    las_vegas_fire_stations as s, las_vegas_fire_incidents as i