K-Nearest Neighbors (KNN) problems look for an arbitrary number of points (K) near to any other given point location.   In a prior post, I discussed identification and mapping of the linkages between a point and it’s K neighbors using straight line distances.  In this post I’m going to look at mapping the network path between the locations.

Full disclaimer – I'm still using Euclidean distance to identify closest locations, but am adding in network paths to show the link between the locations.  I haven’t yet figured out an efficient way to do the KNN on network distances and have it be a reasonable process for anything other than small test datasets.

Many spatial calculations are kinda expensive to calculate, and calculating hundreds, thousands, or bajillions of network paths takes a lot of time.  To find closest on network is much more computationally expensive, and I think you might need to route to each and then rank the distances.  I think that would suck from a performance perspective (or at least with my limited SQL knowledge).  So, I’m going for some approximates in my explanation... your mileage may vary.

Data:

I don't have datasets attached for you to work with, but the fire stations and incidents have been attached to other blog posts in this series and the OSM data can be downloaded for whatever your location of interest is using the instructions in another post: Dynamic Routing in Tableau (with a little PostgreSQL/PostGIS love)

• Las Vegas Fire Stations
• Las Vegas Fire Incidents

General process that I used: Data prep SQL

• Snap fire stations and incident locations to the OSM network (for easy use in pgr_dijkstra routing function)
• Calculate K closest fire stations to each incident

Tableau SQL

• Use pgr_dijkstra to get route between one selected incident (selected using dynamic parameter) and the K nearest fire stations (as identified in data prep step above).  Return a table with K number of rows, each row having a multilinestring with the network route, and an appropriate set of IDs to join attributes from fire stations and incidents back in for visualization / analysis in Tableau

Background:

In my earlier post on dynamic routing, I discussed the basics of querying an OSM-based road network to return a single route between two locations.  In this post, we extend that to return multiple routes from a single location.

The short story on what we need to do the routing:

• A road network – I'm using one based on Open Street Map (OSM) data.  Directions on setting this up are in the Dynamic routing in Tableau with PostGIS and PGRouting blog post.  That post uses a different geographic location, but the process is the same to set up the routing data.
• Origin location (point location) - we’ll take the lat/lon for the origin and snap it to our road network to use for routing.  The routing algorithm takes an ID on the road network for the starting and ending point.

• Destination locations (point locations) - we’ll also snap these destination locations to the road network for use in the routing algorithm.
• A way to bend the data and the queries to our will for use in Tableau.

As a refresher (from my earlier blog posts on network routing using OSM data and PGRouting), here is a basic query to obtain a route between nodes 38251 and 576 on the OSM road network that I’m using (road network for Las Vegas, NV) - note that this is ONLY relying on the OSM road network and isn't adding in anything related to fire stations or fire incidents at this point:

select

seq, name, length_m, cost_s, the_geom

from

(select * from pgr_dijkstra(

'select gid as id, source, target, length as cost from

ways',38251, 576, false)

) as route, ways

-- join the geometry back into the query result

where

route.edge = ways.gid

In the query above, the origin node on the road network is 38521 and the destination node is 576.

The pgr_dijkstra() function returns a table with all of the individual segments in the path between two locations.  Like this:

If we wanted to make the path calculation dynamic, we could replace the origin and destination nodes in the query with parameters, like this...with values for the parameters coming from the Target field in the OSM Ways table:

select

seq, name, length_m, cost_s, the_geom

from

(select * from pgr_dijkstra(

'select gid as id, source, target, length as cost from

ways',<Parameters.Origin>, <Parameters.Destination>, false)

) as route, ways

-- join the geometry back into the query result

where

route.edge = ways.gid

This is great detail for a single path, but for showing multiple paths in Tableau (and to make it easy to join the attributes for the origin and destination back into the result) I’d like to simplify to just return the collected geometry, the origin name/identifier, and the destination name/identifier.

In order to streamline, we’ll use ST_Collect() to aggregate all of these linestrings into a single multilinestring for each of our paths.  When we do that, we can also aggregate up the ‘cost’ associated with the path (sum of the length in meters and the sum of the ‘cost’ in seconds):

select

st_collect(the_geom)::geometry(multilinestring, 4326) as geom, sum(length_m) as length_m_sum, sum(cost_s) as cost_s_sum

from (select * from pgr_dijkstra('

select gid as id, source, target, length as cost from ways',38251, 576, false)

) as route, ways

-- join the geometry back into the query result

where route.edge = ways.gid

This gives us a result like this:

But...in the queries above, I’m using a hard coded value for the origin node and the destination node (whether from a parameter or by just typing in the value). I have to get those values for the origin and destination nodes in the OSM ways dataset from somewhere - because they aren't likely to already be in the point dataset that you want to use for routing!  Here we have choices - we can either pre-calculate them, or calculate them on the fly as were running the query in Tableau.

To calculate them on the fly, I just add in a subquery to snap the geometry for the selected origin or destination and then use that in the pgr_dijkstra query.  So, in this part of the query (we'er going to replace the bold parts):

select * from pgr_dijkstra('

select gid as id, source, target, length as cost from ways',38251, 576, false)

I’d replace both of the 38251 and  576 values with a subquery to retrieve the appropriate value for a node on the OSM road network.  For instance giving me a query like this where I’ve replaced the hard coded values with two selections based on an ID being passed from Tableau as a parameter (more on this in the (Dynamic routing in Tableau with PostGIS and PGRouting) blog post.

select the_geom as geom_route, name, osm_id

from (select * from pgr_dijkstra('

select gid as id, source, target, length as cost from ways',

--source location (this was 38251 above)

(select source

from ways

order by ways.the_geom <-> (

-- select one node using parameter

SELECT geom FROM las_vegas_fire_incidents WHERE id = <Parameters.Origin Name>

)limit 1),

--destination location (this was 576 above)

(select source

from ways

order by ways.the_geom <-> (

-- select one node using parameter

SELECT geom FROM las_vegas_fire_incidents WHERE id = <Parameters.Destination Name>

)limit 1), false)) as route, ways

-- join the geometry back into the query result

where route.edge = ways.gid

What is that doing?  Inside the sub queries for the origin and destinations we're replacing the hard-coded node ID with a query that returns the node ID based on the geographic location of a point of interest in our fire incidents table!  It's pretty cool - we're just dynamically identifying the origin and destination nodes on the road network based on proximity to our selected fire incidents.  And we're selecting our fire incident origin and destination locations using a parameter in Tableau.  That means we can use a parameter action to update the values just by clicking on our map!

But, this gets complicated when I want to calculate paths to multiple destinations.  pgr_dijkstra can take multiple inputs for origin or destination as an array, but that means that I would need to do the subquery for each of my input locations and then squish those values into an array, and then run do the routing. Maybe that is an easy query for someone who is better at SQL than I am, but it wasn’t obvious to me how to do it easily, so I opted to just pre-process my data and attach the OSM node information directly into my input tables.  It was my ‘calculate it once and then never have to do it again’ option (and never have to think about what the SQL would look like to do it dynamically, because that was harder than I was willing to think about this problem.  When you have a good idea on how to do it, let me know and I’ll write the addendum to this post and give you full credit for your brilliant SQL!

Once I had the node from the OSM road network hard coded in each of my data tables I didn't need to look up the nearest node anymore, which simplifies my query quite a bit.  Since these values are unlikely to change (turns out roads tend to stay where they are...), and I would only need to update them if I update the road network, there really isn’t a lot of reason to calculate them dynamically. It also makes my resulting query much simpler when I want to pass in an array of destination locations and get back multiple linestrings (e.g., connect one fire incident location to the three closest fire stations).

Now how do we get the road network paths to our K-nearest neighbors?

I did this as a two part query - using Initial SQL I calculated out the three nearest fire stations to each of my fire incidents (this could be changed depending on what you want to use for K)

create temporary table incidents3 as (

SELECT

s.geom as geom_station,

i.geom as geom_incident,

s.node_station as node_end,

i.node_incident as node_start,

s.facilityname,

ST_Distance(s.geom::geography, i.geom::geography) AS distance,

rank() over (partition by node_incident order by st_distance(s.geom::geography, i.geom::geography)) as ranked_distance

FROM

las_vegas_fire_incidents_nodes AS i

CROSS JOIN LATERAL

(SELECT

facilityname, geom, node_station

FROM

fire_stations_nodes

ORDER BY

geom::geography <-> i.geom::geography

LIMIT 3) AS s

)

Then in custom SQL I used a parameter to update my selected incident node, and then created a set of routes starting at that origin location and going to each of the three nearest fire stations (or use a parameter to select fewer than 3 nearest neighbors...you can select any number for K, so long as it is LESS than what you calculated in your initial sql):

--- using a parameter to subset the table

--- get the origin node, and an array of all of the destination nodes of interest

with selection as (

select min(node_start) as node_start, min(node_end) as node_end, array_agg(node_end) as dests

from incidents3

where node_start = <Parameters.incident_node>

)

--- now calculate the paths and collect all of the line segments for each one

select

min(end_vid) as destnode,

st_collect(the_geom)::geometry(multilinestring, 4326) as geom_route,

(select node_start from selection) as source_node -- should just be one

from

(select * from pgr_dijkstra('

select gid as id, source, target, length as cost from ways',

--source location (find the source ID from ways (closest node to listed OSM ID))

(select node_start

from selection),

(select dests

from selection), false)) as route, ways

-- join the geometry back into the query result

where route.edge = ways.gid

group by end_vid

Now in Tableau I joined back in my fire stations and fire incidents so that I could see ALL of them, not just the one incident selected and the three fire stations closest to it

And with a parameter action to update my selected incident node, I can click on any point:

And have the new routes calculated dynamically

And that's it.  I know it's not easy, but it is powerful and worth spending a bit of time pondering if you're interested in looking at adding dynamic routing and KNN calculations into your visual analytics.