Using Spatial Index Hints in dbt Materializations

This page shows you how to attach spatial index creation and clustering directives to the exact point in the dbt materialization lifecycle where a geometry table’s schema is finalized — so every rebuild of a table or incremental model ships with a working GiST index or bounding-box clustering key instead of an unindexed binary blob.

The failure this prevents is specific. When spatial joins or bounding-box filters start timing out in production, the cause is rarely the geometry UDFs. It is that dbt’s default materializations emit CREATE TABLE AS or INSERT INTO, which silently drop optimizer metadata. The new table has no index, the planner reverts to a sequential scan over GEOGRAPHY/GEOMETRY payloads, joins spill to disk, and the run cascades into failure. Binding index hints to the materialization layer restores sub-second proximity joins and keeps compute predictable.

When to use this approach

Use materialization-bound index hints when a model is rebuilt on a schedule (table or full-refresh incremental) and the planner keeps choosing Seq Scan over a geometry column. This is the right layer because a hint added by hand is lost on the next dbt run.
Prefer a planner-level hint instead when the index already exists and survives rebuilds, but the optimizer still ignores it — that is a query-shaping problem covered in the parent guide on index hints for spatial queries, which uses pg_hint_plan and predicate reshaping rather than DDL.
Reach for join-side tuning instead when the slow path is a nearest-neighbour lookup rather than a missing index; the <-> KNN operator and LATERAL rewrites in speeding up nearest-neighbor joins in PostGIS solve that class of problem at the query level.

Prerequisites

dbt-core ≥ 1.6 with one of dbt-postgres ≥ 1.6, dbt-snowflake ≥ 1.6, or dbt-bigquery ≥ 1.6.
PostGIS ≥ 3.3 reachable through a configured adapter — see PostGIS adapter configuration for the base setup this page assumes.
Grants to run CREATE INDEX, ANALYZE, and EXPLAIN ANALYZE as the dbt deployment role.
A single canonical SRID across every geometry that participates in the hinted join. A mixed-projection join forces an implicit ST_Transform at runtime that voids index usability; normalize upstream with geometry transformation pipelines first.
Extension toggles wired through env_var() rather than hardcoded, e.g. enabled: "{{ env_var('DBT_SPATIAL_HINTS', 'true') }}".

Step-by-step instructions

The reliable pattern is to decouple ingestion from index creation: let dbt build the table, then apply the spatial directive in a hook or a custom materialization so it is reapplied deterministically on every run. The diagram below contrasts the two lifecycles — a default table build that ships an unindexed blob, and an index-bound build that re-attaches the GiST index and ANALYZE on the same run.

Step 1: Diagnose the missing directive

Confirm the planner is actually skipping the index before you change anything. Run EXPLAIN ANALYZE against the slow join.

EXPLAIN ANALYZE
SELECT a.id, b.id
FROM analytics.stg_customer_points a
JOIN analytics.stg_facility_polygons b
  ON ST_DWithin(a.geom, b.geom, 500);

A Seq Scan on either geometry relation, or a Sort (... disk) / “Hash Join … Batches: N” line, confirms the index is absent or unused. If you instead see Index Scan using idx_..._geom, the planner is already traversing the index and a hint will not help.

Step 2: Apply a GiST post-hook for PostGIS

PostGIS cannot declare an index inside CREATE TABLE AS, so build the GiST index immediately after the table lands using a post-hook.

# models/marts/_marts.yml
models:
  - name: spatial_proximity_joined
    config:
      materialized: table
      post-hook:
        - "CREATE INDEX IF NOT EXISTS idx_{{ this.name }}_geom ON {{ this }} USING GIST (geom);"
        - "ANALYZE {{ this }};"

The ANALYZE is non-negotiable: without refreshed statistics PostGIS will keep ignoring the new index and revert to a sequential scan. Verify after the run:

SELECT indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE relname = 'spatial_proximity_joined';
-- idx_scan should climb above 0 once the model is queried

Step 3: Declare clustering for cloud warehouses

Snowflake and BigQuery have no GiST equivalent; they prune micro-partitions, so the “index hint” is a clustering declaration set at table-create time.

Snowflake clusters on a bounding box. Clustering cannot be added through a standard post-hook, so use the cluster_by config:

models:
  - name: spatial_proximity_joined
    config:
      materialized: table
      cluster_by: ["ST_XMIN(geom)", "ST_YMIN(geom)", "ST_XMAX(geom)", "ST_YMAX(geom)"]

Verify clustering health with SELECT SYSTEM$CLUSTERING_INFORMATION('spatial_proximity_joined', '(ST_XMIN(geom))'); — a low average_overlaps value confirms tight spatial locality.

BigQuery clusters on real columns only — cluster_by rejects UDF expressions — so precompute the bounding-box columns during transformation, then cluster on them:

models:
  - name: spatial_proximity_joined
    config:
      materialized: table
      partition_by:
        field: event_date
        data_type: date
        granularity: day
      cluster_by: ["bbox_xmin", "bbox_ymin"]

Confirm pruning by reading the “Bytes processed” estimate in the query plan before and after a bounded filter; a clustered table reads dramatically fewer bytes for an extent-bounded predicate.

Step 4: Encapsulate the hint in a custom materialization

Per-model YAML drifts at scale. Wrap the standard table logic in a custom materialization that injects the index DDL whenever a model sets spatial_index: true, so every spatial table inherits the same guarantee.

-- macros/materializations/spatial_table.sql
{% materialization spatial_table, adapter='postgres' %}
  {% set build = materialization_table_default() %}
  {% if config.get('spatial_index', false) %}
    {% set geom_col = config.get('geom_column', 'geom') %}
    {% do run_query(
      "CREATE INDEX IF NOT EXISTS idx_" ~ this.name ~ "_" ~ geom_col ~
      " ON " ~ this ~ " USING GIST (" ~ geom_col ~ ");"
    ) %}
    {% do run_query("ANALYZE " ~ this ~ ";") %}
  {% endif %}
  {{ return(build) }}
{% endmaterialization %}

Models then opt in with materialized: spatial_table and spatial_index: true, and the index is rebuilt deterministically on every run. For the dispatch patterns that keep this portable across engines, see writing reusable spatial macros in dbt.

Configuration reference

Parameter	Engine	Accepted values	Default	Spatial notes
`post-hook` (GiST)	PostGIS	SQL string array	none	Must include both `CREATE INDEX ... USING GIST` and `ANALYZE`; index without `ANALYZE` is ignored by the planner
`cluster_by`	Snowflake	array of expressions	none	Use the four `ST_X/YMIN/MAX(geom)` bounds; expressions are allowed
`cluster_by`	BigQuery	array of column names	none	Column names only — precompute bbox columns; UDF expressions are rejected
`partition_by`	BigQuery	`{field, data_type, granularity}`	none	Partition on a time column, then cluster spatially within partitions
`spatial_index`	custom mat.	`true` / `false`	`false`	Flag that triggers GiST DDL injection in the custom materialization
`geom_column`	custom mat.	column name	`geom`	Names the geometry column the injected index targets

Gotchas & edge cases

ANALYZE omission is the silent killer. A fresh GiST index without refreshed statistics still yields a Seq Scan. Always pair index creation with ANALYZE in the same hook.
cluster_by expressions fail on BigQuery. Passing ST_XMIN(geom) to BigQuery’s cluster_by errors at create time; only PostGIS-style bounds via Snowflake accept expressions. Precompute columns for BigQuery.
SRID mismatch voids every hint. If two joined geometries carry different SRIDs, the implicit ST_Transform strips index usability no matter how the table was built. Canonicalize SRID upstream.
incremental models skip post-hooks on append. A post-hook with CREATE INDEX IF NOT EXISTS is a no-op after the first build, which is correct — but a full --full-refresh drops and recreates the table, so the hook must reliably rebuild the index, not assume it persists.
Index bloat after heavy DML. Repeated incremental upserts fragment GiST indexes; schedule a periodic VACUUM ANALYZE (or REINDEX CONCURRENTLY) outside the dbt run rather than inside a hook.

FAQ

Why does my GiST index exist but the planner still does a Seq Scan?

Almost always missing statistics. PostGIS chooses a plan from pg_statistic, and a brand-new index has no associated stats until ANALYZE runs. Add ANALYZE {{ this }}; as the post-hook step immediately after CREATE INDEX, then re-check pg_stat_user_indexes.idx_scan.

Can I put CREATE INDEX directly in a Snowflake post-hook?

No. Snowflake has no user-defined indexes; spatial pruning comes from micro-partition clustering. Use the cluster_by config with the four ST_XMIN/YMIN/XMAX/YMAX(geom) bounds instead of any post-hook DDL.

Why does BigQuery reject ST_XMIN(geom) in cluster_by?

BigQuery clusters on physical columns only and does not accept function expressions in cluster_by. Materialize the bounding-box bounds as real columns during transformation (ST_BoundingBox outputs, or precomputed bbox_xmin/bbox_ymin) and cluster on those.

Should index hints live in YAML or a custom materialization?

Use YAML post-hook/cluster_by for a handful of models. Once you maintain many spatial tables, move the logic into a custom materialization gated on a spatial_index: true flag so every model inherits identical optimizer guarantees and configuration cannot drift.

Do these hints help nearest-neighbour queries too?

They are necessary but not sufficient. A GiST index lets the planner consider the <-> KNN operator, but the query must still be written as an index-friendly LATERAL join. See the proximity-join guide for that query shape.

Index Hints for Spatial Queries — the parent guide on steering the planner with pg_hint_plan and predicate reshaping.
Speeding up nearest-neighbor joins in PostGIS — index-friendly <-> KNN join shapes that pair with these hints.
Geometry Transformation Pipelines — canonicalize SRID upstream so hints stay valid.
Choosing the Right Spatial Adapter — how GiST, clustering, and partition pruning differ across PostGIS, Snowflake, and BigQuery.

Up: Part of Index Hints for Spatial Queries.