Recently I’ve been thinking about network analyses in Tableau.  Mostly this has been on how we can dynamically generate routes between locations with a bit of help from PostGIS and custom SQL queries.   The problem that I explored in my post on dynamic routing was focused on a one-to-one relationship on the network…simply how to I go from one location to one other location.  This is interesting, but there are richer questions that we can explore if we expand and explore the network that radiates out to any location from one or more starting points.  This type of road network problem is often calculated as a drive time polygon or a series of polygons that approximate drive time ‘isochrones’ or lines of equal time.

 

While there are several ways to tackle this problem, in this post, I am going to explore this problem using PostgreSQL and PostGIS.  There are good and bad sides to all of the analytic options, and for network travel time / distance, ‘your mileage will vary.’  (yes, I make that joke a lot when talking about spatial analysis problems). Some good reading materials that I recommend to use when considering other methods for tackling your network analysis problems:

 

 

On the general problem of network analysis and drive time calculations

Drivetime calculation is a problem of figuring out the distance from one location to many locations along a network – we start with a single node on our road network then measure out one segment at a time until we reach our maximum distance of interest. By traversing the network we end up with all of the roads and nodes that are within the distance that we set.

While I was figuring out how to do this in PostgreSQL, there were a number of realizations that came to me when I was considering how someone would use these analyses – and, more importantly, how they would understand what was happening and how good / useful the results are.  The main points that I think are worth sharing:

 

  • You need to start with a good road network you start (and I think it helps to be able to dig into the data to do some quality control / assessment).  I used Open Street Map (OSM) data.  It seems fine.  I like that I have full access to peruse the attributes in the data and to do a sanity check on the attributes that I use in calculating ‘cost’ to travel segments on the network.  The OSM data includes several attributes that can be used as a cost, including distance and number of seconds to travel any given segment (based on speed limit and length). It’s great that OSM includes these attributes, because they are a good place to start.  But…you might need to use other metrics to tailor to your specific need or based on your local knowledge.  For instance, speed limit may not mean much if there tends to be a lot of traffic. Or, you may need to adjust for mode of transportation and individual (e.g., walking speed varies substantially between individuals).  The calculations for ‘cost’ on a road network all depends on the quality of the metric you use for each segment.

 

  • The quality of the analysis is based on a combination of road network and the mysteries of the algorithm to calculate ‘cost’ on the network.  The methods behind the PostGIS and pgRouting driving distance calculation are fairly well explained.  I like that. But…the standard implementation of the pgRouting methods on the OSM data still felt a bit unrealistic and overly generous with respect to traveling a network for Seattle.  Maybe my mental map of travel times around town are skewed because I tend to walk, bike, and bus more than I drive, but they seemed pretty generous with respect to how far I could travel within the max time that I set. My guess is that is because of the way that the costs are calculated for road segments.  Seattle has a lot of traffic.  And construction.  And traffic. And lots of intersections that feel like they have poorly timed lights or are in desperate need of a turn arrow for left turns.  But, I couldn’t figure out a way to incorporate the intersection challenges into the drive distance calculation – so the impact of that one spot where I sit for 5 minutes waiting to turn left?  Not going to be included when I look at ‘what is within 10 minutes’ of a location.

 

On the other hand, there are online services that you can use for drive time network calculations that may be more realistic with respect to using these additional driving impediments, but they’re just black box services.  It’s hard to know much about what is really going on under the hood.  They may be better.  Or worse (though in a different way).  Or it may just be something you can chalk up to ‘it’s just different.’  They may be underestimating drive distance, where the pgRouting-based calculations may over-estimate drive distance.  This is one of those cases where ‘your mileage may vary’ – literally.

 

For an example, this image shows the 5 and 10 minute drive time isochrones calculated with the Mapbox isochrone API and the 10 minute drive time using pgr_driving_distance (with cost based on seconds to traverse a segment) and a concave hull around the result.  Big difference… can I absolutely say that one is the right answer and the other is wrong?  No.  They are just different ways of approximating the same thing.  Depending on the quality of your input data and the algorithm used to do the calculation you will get a different result.  To me it’s a question of how much you need to understand what is happening under the hood and how much time you’re willing to spend curating the input data, attributes, and the model traversing the network.

 

 

How to calculate drive time road networks and polygons for use in Tableau using PostgreSQL

 

While there are different ways of calculating these drive time polygons, as I mentioned earlier, I’m just going to explore the PostgreSQL-based solution, since Nick Hara has already nicely documented the TabPy + Mapbox API solution.

 

Now, on with the calculating… Just like in my last post on network analysis, this work will be driven off of some key ingredients:

  • PostgreSQL – I installed the latest version from EnterpriseDB
  • PostGIS – I installed from OSGEO (this will get you postGIS, osmconvert, and osm2pgrouting)

 

And some topological network data.  I use Open Street Map data and recommend:

  • OSM HOT Export Tool – you’ll need to create an account, but then it’s super easy to set your area of interest and grab all sorts of data
  • osmconvert – use the Windows 32bit version (this will run on Win 64 as well).
  • osm2pgrouting – if you install PostGIS using the OSGEO bundle you should already have this ready to go
  • I also grabbed a point dataset (hospitals) from the Seattle open data portal

 

The basics - drivetime calculations

Drivetime calculation is a problem of figuring out the distance from one location to many locations along a network – we start with a single node on our road network then measure out one segment at a time until we reach our maximum distance of interest. By traversing the network we end up with all of the roads and nodes that are within the distance that we set.

 

With pgRouting, this is a simple process of querying the data with pgr_drivingDistance  This function can be used to identify all of the nodes or line segments in the road network that have costs that are less than, or equal to, the maximum ‘cost’ distance that is used in the query.

 

For instance, if we want to know all of the road segments that are within 3 units (miles, meters, minutes, whatever you are using as cost) from an origin point, the result would look like this on a super-simple theoretical road network:

 

While the calculation itself is simple, there are a few limitations to consider that make the problem more complex.  Some of the limitations that I’ve been thinking about lately include:

  • Is the measure you are using for ‘cost’ really reflective of cost for that segment?  For instance, if you have a travel time in seconds that is based on speed limit and segment length, is that a good assessment of travel time?  What if it’s common to have slow-downs based on traffic, or construction, or other common driving hindrances (e.g., the vehicle trying to turn left at an intersection without a green turn arrow…and they block all of the traffic behind them until a car in the oncoming traffic is nice enough to let them turn.  Sigh.)
  • Can you include ‘cost’ at intersections as well as at the road segments (e.g., average wait time for traffic lights, right turn vs. left turn cost, etc.)
  • Does the analysis need to reflect one-way roads (e.g., you can’t drive the wrong way, but you can walk the ‘wrong’ way on the segment…)
  • How do you convert the resulting road segments or nodes into polygonal ‘drive-sheds?’
  • And probably all sorts of other potential issues…

 

Before we bog down too far into limitations, let’s just build some data and see what we can do with it.

 

To run a query on a road network with pgr_drivingDistance, we need to specify a few key bits of information:

  • the network details – our line segments (ways)
  • the origin node on the network – the location from which we are measuring
  • what is the ‘cost’ for traversing each segment – this can be anything you want…distance, time, etc.  The example below was super simple and used the same cost for each segment, but any numeric value should be usable to determine cost.

 

Doing the calculation to visualize it in Tableau is just a simple bit of custom SQL – note that I’m using the same database that I set up in my last post on routing, so if you want to play along the instructions on setup are over in that post.

 

First we create our table using custom SQL:

And use a query like this:

select * from pgr_drivingDistance(

        'select gid as id, source, target, cost_s as cost

         from ways',

        -- origin node

           (select source

           from ways

           order by ways.the_geom <-> (

                -- select one node using parameter

                SELECT wkb_geometry

                FROM hospitals_4326

                WHERE ogc_fid = <Parameters.Select Hospital>

            )limit 1),

 

        -- distance (in seconds)

            <Parameters.Time (seconds)>,

 

        -- undirected (can travel in both directions on a road segment)

           false) as route, ways_vertices_pgr as vertices

where route.node = vertices.id

 

This query peruses the street network to find all of the segments within a set distance (defined in whatever units are appropriate…here were using number of seconds) of the origin location (defined with a subquery to identify the origin node based on the ID for a specific point of interest).  The result is then joined back to the table that has the geometry for the nodes in the road network.  That last join (where route.node = vertices.id) gives us a geography column that we can visualize in Tableau.  The original query would just return the list of IDs and not have any geometry that we could map in Tableau.

 

In this case I’m using ‘cost_s’ as the cost – this is just the cost in seconds to traverse the segment (based on segment length and speed limit), but the cost can be any value associated with a segment.  For instance, OSM data also comes with a ‘length_m’ field if you just want distance, or you can calculate out your own cost using other attributes of interest (e.g., average walking speed, etc.).

 

And gives us a result that we can map in Tableau like this:

 

 

Or, instead of returning the nodes on the road network, we could join in our road segments (ways) and see the results as lines instead.

select * from pgr_drivingDistance(

        'select gid as id, source, target, cost_s as cost

         from ways',

        -- origin node

           (select source

           from ways

           order by ways.the_geom <-> (

                -- select one node using parameter

                SELECT wkb_geometry

                FROM hospitals_4326

                WHERE ogc_fid = <Parameters.Select Hospital>

            )limit 1),

 

        -- distance (in seconds)

            <Parameters.Time (seconds)>,

 

        -- undirected (can travel in both directions on a road segment)

           false) as route, ways

where route.edge = ways.gid

 

Which gives us a result with road segments like this:

 

Creating polygon ‘drive-sheds’

 

While the result from these queries are pretty, they don’t necessarily help with some key analytical questions that we might want to answer, like how many customers (or insured properties or hospitals or delivery trucks or competitors, etc.) are within 5 minutes? 10 minutes? Etc.   To answer this type of question, we need to convert the results of our network analysis into polygons and use the built in Tableau spatial intersection join type.  The way that I’ve been doing this is to use a convex or concave hull around the lines or points returned from the query.

 

A convex hull is the minimum geography encompassing all points in the dataset

 

A concave hull is similar to a convex hull, but is more of the ‘shrink wrapped’ version that fits tighter to the dataset. You can specify a target percent for the area of the hull that PostGIS will try to fit.

 

If this sort of quick result is useful for exploration, you can just create a single convex hull around the result.  You can either do this as a RAWSQLAGG_SPATIAL calculated field, or as part of the original query.  Here is the example for a concave hull as a calculated field:

 

Or you can put it in the original custom SQL.  The advantage of having it as part of the custom SQL is that you could then use the resulting polygon easily as part of a spatial intersection.  This would let you identify all of the points that fall within the driving area with a quick spatial join.   The new parts of the query are in bold below:

 

select st_convexhull(st_collect(the_geom))::geography(polygon) as isochrone from (

select * from pgr_drivingDistance(

            'select gid as id, source, target, cost_s as cost

             from ways',

            -- origin node

               (select source

               from ways

               order by ways.the_geom <-> (

                    -- select one node using parameter

                    SELECT wkb_geometry

                    FROM hospitals_4326

                    WHERE ogc_fid = <Parameters.Select Hospital>

                )limit 1),

 

            -- distance (in seconds)

                <Parameters.Time (seconds)>,

 

            -- undirected (can travel in both directions on a road segment)

               false)) as route, ways_vertices_pgr as vertices

where route.node = vertices.id

 

Now that we have a polygon for the hull around the points / line segments, we can use a spatial intersection to answer some questions about our data – like:

  • How many {customers, patients, delivery vans, rock quarries, etc.} are within x distance from a location?
  • What areas aren’t being served by a set of facilities? For instance, what areas are inside and outside a five minute drive from each fire station?

 

To answer these types of question, you can just use the polygon in a spatial join and sum up points inside or outside the polygon:

or

 

All in all, this method of calculating ‘drive time’ areas using pgRouting driving distance calculations has some utility as an estimate of area covered within set distance, time, etc. of an origin location.  If we add in parameter actions we can then build up a workbook that allows click to select origin location, with analyses that update:

isochrone_animation.gif

 

This gives us a quick result, but it will have some limitations in terms of accuracy due to the convex or concave hull.  For instance, it tends to over-estimate the area within the ‘drive-shed’ in order to draw the polygon quickly.  This is especially true with irregular road networks where there are gaps in the network (like lakes):

 

There is potentially a fix for this sort of problem, but it wasn’t quickly obvious from my reading through various pgRouting and PostGIS files.  My guess is that whatever the solution is, it won’t be fast to do it with PostgreSQL….and that it would be much faster to do something cleaner with some other set of spatial tools (e.g., a nicely put together spatial library for Python, C++, etc.).

 

And, of course, there are the Mapbox APIs which you can use, though, even with Nick Hara’s blog post on the topic, I haven’t figured out the way to have both interactivity in generation of the drive time isochrones and the ability to use them in quick spatial joins in Tableau.  There is room for more thinking on this topic, and I’m going to keep running it around in the back of my head as time permits.