Should I use ST_DWithin or a KNN nearest-neighbor join?

Use ST_DWithin for fixed-radius membership such as everything within 500 metres. Use the KNN distance operator in a CROSS JOIN LATERAL when you want the nearest rows regardless of absolute distance. They answer different questions.

Writing reusable ST_DWithin macros in dbt

This page shows you how to package a single, index-safe ST_DWithin distance predicate as a dbt macro that compiles to correct syntax on PostGIS, BigQuery, and Snowflake without rewriting SQL per warehouse.

When to use this approach

Reach for a reusable distance macro — rather than hand-writing ST_DWithin in each model — when any of these hold:

You run the same proximity logic on more than one engine. If models target PostGIS in production but the DuckDB spatial extension or BigQuery in CI, a single dispatched interface keeps the predicate identical across runs. This sits one level below the dispatch framework in building custom spatial macros.
You filter on a fixed radius (e.g. “within 500 m”). ST_DWithin is the right primitive for radius membership. If you instead need the single closest row, use a <-> KNN lateral join — see speeding up nearest-neighbor joins in PostGIS.
Distance bugs keep recurring from units drift (degrees vs. metres) or silent spatial-index bypasses. Centralizing the cast and invocation pattern in one macro fixes the class of bug once. Planner-steering details live in index hints for spatial queries.

Prerequisites

dbt Core 1.5+ so adapter.dispatch and namespace search_order are available.
A verified spatial adapter: dbt-postgres against PostGIS 3.x, and/or dbt-bigquery / dbt-snowflake for those warehouses.
Database grants: CREATE on the target schema, plus USAGE on the PostGIS extension schema where it is isolated.
A canonical project SRID decided up front (this guide stores in EPSG:4326 and measures on geography). Set it once in dbt_project.yml:

# dbt_project.yml
vars:
  project_srid: 4326          # canonical storage CRS
  default_radius_meters: 500  # fallback radius when a model omits one

Connection secrets wired through dbt’s env_var() — never hardcode hosts or credentials in profiles.yml.

Step-by-step instructions

1. Create the dialect-aware macro

Place the macro at macros/spatial/st_dwithin.sql. It normalizes warehouse-specific syntax while enforcing the geography cast that gives true-metre distances, and raises a compiler error on unsupported targets instead of emitting silently wrong SQL.

-- macros/spatial/st_dwithin.sql
{% macro st_dwithin(geom_a, geom_b, distance_meters, use_geography=true) %}
  {%- set db_type = target.type -%}
  {%- if db_type in ['postgres', 'redshift'] -%}
    {%- if use_geography -%}
      ST_DWithin({{ geom_a }}::geography, {{ geom_b }}::geography, {{ distance_meters }})
    {%- else -%}
      ST_DWithin({{ geom_a }}, {{ geom_b }}, {{ distance_meters }})
    {%- endif -%}
  {%- elif db_type == 'bigquery' -%}
    ST_DWITHIN({{ geom_a }}, {{ geom_b }}, {{ distance_meters }})
  {%- elif db_type == 'snowflake' -%}
    ST_DWITHIN(TO_GEOGRAPHY({{ geom_a }}), TO_GEOGRAPHY({{ geom_b }}), {{ distance_meters }})
  {%- else -%}
    {{ exceptions.raise_compiler_error("Unsupported warehouse for st_dwithin macro: " ~ db_type) }}
  {%- endif -%}
{% endmacro %}

One macro call branches on target.type and emits a different distance predicate per warehouse — applying each engine’s geography handling, and failing loudly on anything it does not recognise:

Verify it compiles to the dialect you expect without running a model:

dbt compile --select my_proximity_model
# Inspect target/compiled/.../my_proximity_model.sql and confirm the
# ::geography cast (PostGIS) or TO_GEOGRAPHY wrap (Snowflake) is present.

2. Invoke it from a model with validated, isolated inputs

Spatial indexes (GiST in PostGIS, search-optimized indexes in Snowflake) are silently bypassed when the planner meets an implicit cast or a mid-query geometry mutation. Isolate validation in upstream CTEs so the macro invocation stays a clean, index-eligible predicate.

-- models/marts/fct_point_polygon_proximity.sql
{{ config(materialized='table') }}

WITH validated_points AS (
  SELECT id, geom
  FROM {{ ref('stg_source_points') }}
  WHERE geom IS NOT NULL AND ST_IsValid(geom)
),
validated_polygons AS (
  SELECT id, geom
  FROM {{ ref('stg_target_polygons') }}
  WHERE geom IS NOT NULL AND ST_IsValid(geom)
)
SELECT
  p.id    AS point_id,
  poly.id AS polygon_id,
  ST_Distance(p.geom::geography, poly.geom::geography) AS exact_distance_meters
FROM validated_points p
JOIN validated_polygons poly
  ON {{ st_dwithin('p.geom', 'poly.geom', 1000) }}

Verify the index is used rather than a sequential scan:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM fct_point_polygon_proximity LIMIT 1;
-- Expect "Index Scan using ..._gist" on the geometry column, not "Seq Scan".

Never wrap {{ st_dwithin(...) }} inside CASE or COALESCE: that forces full evaluation before filtering and routinely triggers out-of-memory errors on large tables.

3. Add a bounding-box pre-filter for incremental runs

Proximity models rarely benefit from ephemeral or view materializations because of repeated index scans. Prefer table or incremental, and shrink the candidate set with a && bounding-box overlap before the precise predicate runs.

{% if is_incremental() %}
  AND p.geom && ST_MakeEnvelope(
    {{ var('incremental_bbox_min_lon') }},
    {{ var('incremental_bbox_min_lat') }},
    {{ var('incremental_bbox_max_lon') }},
    {{ var('incremental_bbox_max_lat') }},
    {{ var('project_srid') }}
  )
{% endif %}

Verify the reduction by comparing planned row counts with and without the envelope; in dense urban data the candidate surface typically drops 60–90%.

4. Guard inputs with a dbt test

Distance joins fail silently when invalid geometries slip into production. Assert validity declaratively so a bad geometry fails the build instead of the map render.

# models/staging/_staging.yml
models:
  - name: stg_source_points
    columns:
      - name: geom
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "ST_IsValid(geom)"

Verify with dbt build --select +fct_point_polygon_proximity, which runs upstream models and their tests together before the mart materializes.

Configuration reference

Parameter	Accepted values	Default	Spatial notes
`geom_a` / `geom_b`	column or expression yielding `geometry`/`geography`	—	Pass pre-validated columns; do not inline `ST_MakeValid` here or you lose the index
`distance_meters`	numeric literal or `var()`	—	Interpreted as metres under the `geography` path; in planar `geometry` it is CRS units
`use_geography`	`true` / `false`	`true`	`true` casts to `::geography` for true spheroidal metres (PostGIS/Redshift only)
`target.type`	`postgres`, `redshift`, `bigquery`, `snowflake`	—	Any other value raises a compiler error rather than emitting wrong SQL
`var('project_srid')`	EPSG code	`4326`	Used by the envelope filter and any upstream `ST_SetSRID` normalization

Gotchas & edge cases

Degrees instead of metres. With use_geography=false, ST_DWithin measures in the geometry’s own CRS — on raw EPSG:4326 that is degrees, so 500 means 500° and matches everything. Keep use_geography=true, or reproject to a metric SRID first.
SRID = 0 geometries. Imports frequently arrive tagged SRID 0; casting them to geography errors or mismeasures. Normalize the coordinate reference system with ST_SetSRID then ST_Transform in staging before this macro ever sees the column.
BigQuery implicit conversions. Cast inputs to GEOGRAPHY explicitly upstream; an implicit string-to-geography conversion bypasses clustering keys and forces a full scan.
Snowflake geometry vs geography. The macro’s TO_GEOGRAPHY wrap assumes spheroidal semantics; if a column is already GEOMETRY in a projected SRS, wrapping it re-interprets coordinates as lon/lat. Standardize the storage type per engine and pick the engine via choosing the right spatial adapter.
Predicate hidden behind a function. COALESCE(st_dwithin(...), false) or a CASE wrapper defeats index use and evaluates every pair — keep the macro call as a bare JOIN ... ON or WHERE predicate.

FAQ

Why does my ST_DWithin filter match every row?

Almost always a units mismatch. With use_geography=false on EPSG:4326 data the distance is measured in degrees, so any small numeric threshold spans the whole globe. Set use_geography=true (the default) so the macro casts to ::geography and measures metres, or reproject the geometry to a metric SRID before filtering.

Why is the spatial join still doing a sequential scan?

The planner cannot use a GiST or search-optimized index when the predicate sits inside CASE/COALESCE, when an implicit type cast happens at execution time, or when statistics are stale. Keep {{ st_dwithin(...) }} as a bare join predicate over pre-validated, correctly-typed columns, then run ANALYZE and confirm an Index Scan with EXPLAIN.

Should I use ST_DWithin or a KNN (<->) join?

Use ST_DWithin for fixed-radius membership (“everything within 500 m”). Use the <-> KNN operator in a CROSS JOIN LATERAL when you want the N nearest rows regardless of absolute distance. The two answer different questions; the nearest-neighbor pattern is covered in the proximity-joins guide.

How do I keep the macro working across PostGIS, BigQuery, and Snowflake?

The macro branches on target.type and applies each engine’s geography handling — ::geography for PostGIS/Redshift, a bare call for BigQuery’s spherical geography, and TO_GEOGRAPHY for Snowflake. Compile against each target in CI and diff the generated SQL so a new warehouse never silently emits the wrong dialect; unsupported targets raise a compiler error by design.

Building Custom Spatial Macros — the adapter.dispatch framework this distance predicate plugs into.
Optimizing Proximity Joins — bounding-box and partition strategies that complement the radius filter.
Speeding up nearest-neighbor joins in PostGIS — the <-> KNN alternative for closest-row queries.

Up: Part of Building Custom Spatial Macros.