When working with spatial databases, it often helps to create a spatial index for your datasets.  The spatial index makes it more efficient to process spatial queries by helping narrow down the number of features that have to be tested against.  For instance, if you want to know which polygon a point falls into you can check for intersection with EVERY possible polygon, or you can use a spatial index to limit the number of point in polygon operations you have to do.  A common spatial index is based on the bounding box of the polygon.  Checking if a point is within the bounding box is much quicker than testing for spatial intersection with a complex polygon.  By limiting to just the polygons where the point is inside the bounding box, we can limit our need to do the ‘more expensive’ calculation and just calculate against the smaller set of likely polygons.

 

More information on spatial indexes:

https://en.wikipedia.org/wiki/Spatial_database#Spatial_index

https://postgis.net/workshops/postgis-intro/indexing.html

 

How do you create a spatial index?

Easy.  Here is an example of creating an index on a table:

create index <table name>_geom_idx

on <table name>

using GIST(<geometry field>)

 

 

Note that some of the PostGIS functionality will use a spatial index by default (e.g., ST_Contains).  That is super cool.  Also note that you can easily force it to ignore the spatial index by using the _ST_Contains function instead.

 

 

Test calculations

Now, let’s do a test to see how much of a difference the spatial index makes!

Test data:

  • Points – a dataset of 1000 random point locations, all within the conterminous US

 

(gratuitous ugly map of random point data!)

 

  • Polygons – three datasets of different sizes covering entire US (including AK and HI).  The datasets are from the US Census cartographic boundary files
  • US States (51 polygons) – includes DC
  • US Counties (3233 polygons)
  • US Census tracts (74002 polygons)

 

 

How long does it take for a query to identify the polygon that each point falls inside using the following query? 

select point.*, poly.geoid

from random1000 as point

join tl_2018_us_state as poly

on st_contains(poly.geom, point.geom)

 

 

 

 

Or, if you want the actual numbers:

 

 

 

 

State

 

County

 

Census Tract

 

Without spatial index

 

3250.772

 

33319.82

 

611127.95

 

With spatial index

 

859.462

 

788.091

 

1120.448

 

For reference, the ST_Contains() function uses a spatial index by default, but you can force it to ignore the index by using _ST_Contains() instead (note the underscore ( _ ) before the ST_Contains in the non-indexed query.  That is what I used in the tests above. 

 

Seems worthwhile to use a spatial index!