Introduction
H3 is a hierarchical grid system designed to efficiently index and organize geospatial data. It divides the Earth’s surface into hexagonal cells of various sizes, creating a hierarchical structure that allows for spatial indexing and querying.
In context to postgres, uber h3 can be used to represented your spatial data as h3 indexes which are 64bit integers which can be used to turn your spatial operations into relational operations.
Spoiler: Query time went from 543 seconds to 1 second
TLDR;
- Represent your spatial h3 index
- Use relational DB for what’s it’s good at
Settting up
Get the environment up and running.
Creating docker image
You can skip this section and use the docker image in the last step. I only included these steps as a reference if you need to build the h3 extension in your own environment.
# Pull any postgis docker image
docker pull postgis/postgis:15-master
# Let's run a demo container where we can install h3 extension
docker run -it --name postgis-h3 -e POSTGRES_PASSWORD=postgres postgis/postgis:15-master
# shell access into the docker
docker exec -it -u root postgis-h3 bash
# Install the dependencies
apt update
apt install -y pip libpq-dev postgresql-server-dev-15
pip install pgxnclient cmake
# Install the extension
pgxn install h3
# Remove the dev dependencies
pip uninstall pgxnclient cmake
apt purge -y libpq-dev postgresql-server-dev-15 pip
exit
docker commit postgis-h3 postgis-h3
docker stop postgis-h3
docker rm postgis-h3
docker run -it --name postgis-h3 -d -p 6432:5432 -e POSTGRES_PASSWORD=postgres postgis-h3
Create docker container from the pre-built image
docker run -it --name postgis-h3 -d -p 6432:5432 -e POSTGRES_PASSWORD=postgres jsonsingh/postgis-h3
Test the extension
PGPASSWORD=postgres psql -p 6432 -h localhost
-- Create a temporary database
Create database temp;
\c temp;
create extension h3;
create extension h3_postgis CASCADE;
SELECT h3_lat_lng_to_cell(POINT('37.3615593,-122.0553238'), 5);
-- Clean up
\c postgres
Drop database temp
Let’s add some data
We’ll be using the data from OSM but any spatial can be used for this exercise.
Get the OSM data from https://download.geofabrik.de/ Choose whichever ones you want. I’ll be using Australia and Oceania.
# Just to make our life easy
export PGPASSWORD=postgres
export PGPORT=6432
export PGHOST=localhost
# Needed to load osm data to postgres
apt install osm2pgsql
# our database and extension
psql -c "CREATE database osm_data;"
psql -d osm_data -c "CREATE extension postgis;"
# This will take some time
osm2pgsql -c -d osm_data -x -E 4326 <osm_file>
# Mulitple OSM files
osm2pgsql -a -d osm_data -x -E 4326 <osm_file_2>
Defining the Problem statement (Point in Polygon)
For this sample dataset. We’ll take admin boundary level 6 (polygons) and points (trees) and see which tree lies in which admin boundary. It’s a simple point in polygon problem.
Creating separate tables for polygons and points
Let’s get a subset of data
PGPASSWORD=postgres psql -p 6432 -h localhost -d osm_data
CREATE TABLE planet_osm_polygon_admin_6 AS SELECT DISTINCT ON (osm_id) * FROM planet_osm_polygon WHERE admin_level = '6';
CREATE TABLE planet_osm_trees AS SELECT * FROM planet_osm_point WHERE "natural" = 'tree';
SELECT COUNT(*) FROM planet_osm_polygon_admin_6;
-- 5518
SELECT COUNT(*) FROM planet_osm_trees;
-- 799130
Point in Polygon (PostGIS)
In postGIS we can simply use ST_Within
-- Not let's try to find all the trees in these countries
SELECT
b.osm_id,
COUNT(a.osm_id)
FROM
planet_osm_trees a
RIGHT JOIN planet_osm_polygon_admin_6 b ON ST_Within (a.way, b.way)
GROUP BY
b.osm_id
ORDER BY
COUNT;
-- took 343 seconds
Point in Polygon (Uber h3)
We’ll use h3 index of level 7 to represent our points and polygons and use a join using =
which you’ll see is considerably faster
CREATE extension h3;
CREATE extension h3_postgis CASCADE;
-- Let's try the h3 grid. Calculate all the h3 index for all trees
ALTER TABLE planet_osm_trees ADD COLUMN h3_index h3index GENERATED ALWAYS AS (h3_lat_lng_to_cell(way::POINT, 7)) STORED;
-- For polygons we need to do a bit of extra work
-- This function takes a polygon and gives all the h3 index inside that polygon
CREATE OR REPLACE FUNCTION get_h3_indexes(shape geometry, index integer)
RETURNS h3index[] AS $$
DECLARE
h3_indexes h3index[];
BEGIN
SELECT ARRAY(
SELECT h3_polygon_to_cells(shape, index)
) INTO h3_indexes;
RETURN h3_indexes;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
-- We calculate all the h3 index for all the shapes. This will take a lot of time but it's a one time process
ALTER TABLE planet_osm_polygon_admin_6 ADD COLUMN h3_indexes h3index[] GENERATED ALWAYS AS (get_h3_indexes(way,7)) STORED;
-- 30 seconds
-- Only for visualization purpose
ALTER TABLE planet_osm_trees ADD COLUMN h3_shape geometry GENERATED ALWAYS AS (h3_cell_to_boundary_geometry(h3_lat_lng_to_cell(way::POINT, 7))) STORED;
-- Create a flat table for faster joins and visualize this table in QGIS
CREATE TABLE
planet_osm_polygon_admin_6_flat AS
SELECT
h3_index,
osm_id,
-- This is just for visualization and not necessary
h3_cell_to_boundary_geometry (h3_index)
FROM
(
SELECT
UNNEST (h3_indexes) AS h3_index,
osm_id
FROM
planet_osm_polygon_admin_6
) AS a;
-- 8 seconds
-- We compare the h3 index and find all the shape ids
SELECT
b.osm_id,
COUNT(a.osm_id)
FROM
planet_osm_trees a
JOIN planet_osm_polygon_admin_6_flat b ON a.h3_index = b.h3_index
GROUP BY
b.osm_id
ORDER BY
COUNT;
-- takes 1 sec
The above query took 1 seconds compared to 543 seconds which is a significant improvement and currently there’s no indexing involved.
-
Trees represented as h3 index
-
Trees and admin shapes as h3 indexes
Compact Polygon
Polygon can be represented by different levels of h3 indexes. This can be useful when you need to partition the data.
-- create a function which returns compacted h3
CREATE OR REPLACE FUNCTION get_h3_indexes_compact(shape geometry, index integer) RETURNS h3index[] AS $$ DECLARE h3_indexes h3index[];
BEGIN WITH cells AS (
SELECT
h3_polygon_to_cells(shape, index) AS cell_array
)
SELECT
Array(
SELECT
h3_compact_cells(
array_agg(cell_array)
)
FROM
cells
) into h3_indexes;
RETURN h3_indexes;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
-- Let's create the compacted array indice
ALTER TABLE
planet_osm_polygon_admin_6
ADD
COLUMN h3_indexes_compacted h3index[] GENERATED ALWAYS AS (
get_h3_indexes_compact(way, 7)
) STORED;
-- Visualize the below table in QGIS
CREATE TABLE
planet_osm_polygon_admin_6_flat_compact AS
SELECT
h3_index,
osm_id,
h3_cell_to_boundary_geometry (h3_index)
FROM
(
SELECT
UNNEST (h3_indexes_compacted) AS h3_index,
osm_id
FROM
planet_osm_polygon_admin_6
) AS a;
Polygons represented as compact h3 index
More things to try
- Using h3 raster functions to work with rasters in postgis
- Nearest neighbors (kRing), helpful especially in ML analysis
- Edge function, moving from one cell to the next creating a path with h3index
- Optimizing queries using postgres techniques such as indexing and partitioning
Sources:
- https://www.youtube.com/watch?v=ay2uwtRO3QE&t=1044s&ab_channel=UberEngineering
- https://www.uber.com/en-IN/blog/h3/
- https://h3geo.org/docs/
- https://github.com/zachasme/h3-pg
- https://hub.docker.com/r/postgis/postgis
- https://h3geo.org/docs/community/tutorials
- https://github.com/jashanbhullar/foss4g-2023-spatial-analysis-in-postgres-using-uber-h3