Versioning Spatial Schemas in dbt

Versioning a spatial schema is not the same problem as versioning a relational one. A GEOMETRY or GEOGRAPHY column carries constraints that live outside the column’s base type — an SRID that binds it to a coordinate reference system, a topology contract that downstream operators assume is valid, a bounding-box extent that query planners cache, and a spatial index whose structure is welded to the column definition. Standard migration tooling sees only the type name, so a change from GEOMETRY(Point, 4326) to GEOMETRY(Point, 3857) passes silently through information_schema.columns while it quietly corrupts every distance calculation, spatial join, and rendered tile downstream of it.

This guide shows how to make spatial schema evolution explicit, auditable, and safe inside a dbt project. It belongs to the broader Spatial Data Architecture & Governance practice, which treats geometry, coordinate systems, and lineage as first-class warehouse citizens. Here the focus narrows to one mechanism: how do you add, alter, and drop spatial columns — and rebuild the indexes bound to them — without leaning on dbt run’s default on_schema_change: sync_all_columns, which knows nothing about SRIDs, validity, or GiST structures?

Prerequisites

The patterns below assume a PostGIS-backed warehouse but flag the cloud-native equivalents where they diverge.

dbt Core >= 1.6 (or dbt Cloud on the equivalent runtime) so that var() defaults, run_query, and adapter.get_columns_in_relation behave consistently across environments.
Adapter / extension: dbt-postgres >= 1.6 with PostGIS >= 3.1 (stable ST_IsValid and GiST cost estimates), or dbt-bigquery >= 1.6 for GEOGRAPHY-typed schemas. DuckDB users need the spatial extension — see DuckDB spatial extension integration for CI parity before promoting to PostGIS.
Database permissions: the dbt service role needs CREATE/USAGE on the target schema plus the right to run ALTER TABLE, CREATE INDEX, and DROP INDEX. DDL that mutates spatial columns should never run under an analyst role — gate it per Data Security & Scoping Rules.
Environment variables: never inline the canonical SRID or environment name. Resolve them through env_var() for connection-level context and var() for run-level overrides so the same models deploy unchanged from staging to production.
A canonical CRS already enforced at staging. Schema versioning built on un-normalized geometries is unsound — resolve mixed projections first via Spatial Reference System Management.

Architecture Context

Spatial schema versioning is a control loop wrapped around the incremental layer of the spatial graph. Raw geometry lands untrusted; staging normalizes its CRS and validity; the model materializes; and only then does a schema-aware post-hook reconcile the live column set against the source contract, applying SRID-bound DDL and rebuilding any index it invalidated. The reconciliation result — column added, SRID applied, index rebuilt — is logged to a metadata table that the spatial model dependency graph and the cross-environment tracker both read from.

The placement of the reconcile step matters. Run it before the incremental merge and it operates on a column set that does not yet reflect the new data; run it as an unscoped on_schema_change and dbt drops and re-adds the geometry column with a default SRID of 0, silently detaching it from its coordinate system. The remainder of this page builds the loop from the inside out: configuration, the sync macro, validation tests, index management, and the cross-environment audit trail.

Configuration Walkthrough

Schema evolution is environment-driven, so the canonical SRID and the active environment live in dbt_project.yml as defaults and in connection context via env_var(). Define a single source of truth, then let every macro read from it rather than hard-coding 4326.

# dbt_project.yml
name: spatial_platform
version: "1.0.0"
profile: spatial_platform

vars:
  # Canonical storage CRS; overridden per-run with --vars '{default_srid: 3857}'
  default_srid: 4326
  # Drives audit-table partitioning and parity checks
  deploy_environment: "{{ env_var('DBT_ENV', 'dev') }}"

models:
  spatial_platform:
    +on_schema_change: append_new_columns   # never sync_all for spatial models
    marts:
      +materialized: incremental

Because spatial DDL must run after the incremental merge commits, register the reconcile and index hooks at the model level rather than as project-wide on-run-end operations — the latter fire outside the model’s transaction boundary and cannot see per-model column diffs.

# models/marts/_marts.yml
models:
  - name: spatial_fact_table
    config:
      materialized: incremental
      unique_key: feature_id
      post-hook:
        - "{{ spatial_sync_columns(this, source('raw', 'spatial_source')) }}"
        - "{{ manage_spatial_indexes(this, 'geom', 'idx_spatial_fact_geom') }}"

Set on_schema_change to append_new_columns (not sync_all_columns) so dbt never issues its own naive DROP COLUMN/ADD COLUMN against a geometry — the SRID-aware macro owns every spatial mutation. For an end-to-end PostGIS profile, including the spatial extension bootstrap, see setting up PostGIS with dbt.

Core Implementation: SRID-Aware Column Sync

The reliable pattern for spatial schema evolution is a post-hook macro that diffs the live column set against the source contract and applies spatial-aware DDL. Plain ALTER TABLE ... ADD COLUMN works for scalars, but a geometry column needs an explicit SRID binding at creation time, otherwise PostGIS stamps it 0 and detaches it from its coordinate system.

-- macros/spatial_sync_columns.sql
{% macro spatial_sync_columns(target_relation, source_relation) %}
  {% set existing_cols = adapter.get_columns_in_relation(target_relation) %}
  {% set source_cols = adapter.get_columns_in_relation(source_relation) %}

  {% set existing_names = existing_cols | map(attribute='name') | list %}
  {% set source_names = source_cols | map(attribute='name') | list %}

  {% set to_add = source_names | reject('in', existing_names) | list %}
  {% set to_drop = existing_names | reject('in', source_names) | list %}

  {% set default_srid = var('default_srid', 4326) %}

  {% for col_name in to_add %}
    {% set col_def = source_cols | selectattr("name", "equalto", col_name) | first %}
    {% if col_def.dtype in ['geometry', 'geography'] %}
      {% set sql %}
        ALTER TABLE {{ target_relation }}
        ADD COLUMN {{ col_name }} {{ col_def.dtype }}(Geometry, {{ default_srid }})
      {% endset %}
      {{ log("Adding spatial column with explicit SRID binding: " ~ col_name, info=True) }}
      {% do run_query(sql) %}
    {% else %}
      {% do run_query("ALTER TABLE " ~ target_relation ~ " ADD COLUMN " ~ col_name ~ " " ~ col_def.dtype) %}
    {% endif %}
  {% endfor %}

  {% for col_name in to_drop %}
    {% set col_def = existing_cols | selectattr("name", "equalto", col_name) | first %}
    {% if col_def.dtype in ['geometry', 'geography'] %}
      {{ log("Dropping spatial column and invalidating dependent indexes: " ~ col_name, info=True) }}
    {% endif %}
    {% do run_query("ALTER TABLE " ~ target_relation ~ " DROP COLUMN IF EXISTS " ~ col_name) %}
  {% endfor %}
{% endmacro %}

The macro intentionally treats geometry adds and drops as distinct, logged events. An added spatial column is created with its SRID in a single statement, so there is never a window in which the column exists without a coordinate system. A dropped spatial column logs an explicit warning because the drop cascades to any index built on it — the next section rebuilds those deterministically rather than letting the planner discover the loss mid-query. For the deeper macro-design principles this builds on — parameterization, idempotency, cross-engine portability — see building custom spatial macros.

Handling SRID changes vs. column adds

An SRID change is not an add or a drop — the column name and base type are unchanged, so the diff above never sees it. Detect it by comparing the contracted SRID against ST_SRID on a sampled row, then re-project in place with ST_Transform rather than truncating coordinates:

-- macros/reconcile_srid.sql
{% macro reconcile_srid(relation, column_name, target_srid) %}
  {% set probe %}
    SELECT DISTINCT ST_SRID({{ column_name }}) AS srid
    FROM {{ relation }}
    WHERE {{ column_name }} IS NOT NULL
    LIMIT 1
  {% endset %}
  {% set current_srid = run_query(probe).columns[0].values()[0] if execute else none %}

  {% if current_srid is not none and current_srid != target_srid %}
    {{ log("Re-projecting " ~ column_name ~ " from SRID " ~ current_srid ~ " to " ~ target_srid, info=True) }}
    {% set ddl %}
      ALTER TABLE {{ relation }}
      ALTER COLUMN {{ column_name }}
      TYPE geometry(Geometry, {{ target_srid }})
      USING ST_Transform({{ column_name }}, {{ target_srid }})
    {% endset %}
    {% do run_query(ddl) %}
  {% endif %}
{% endmacro %}

Using ST_Transform inside the ALTER COLUMN ... USING clause re-projects the stored geometry instead of merely re-stamping the SRID with ST_SetSRID, which would mislabel coordinates without moving them — the single most common cause of “the points are in the ocean” bug reports.

Validation & Testing

Schema versioning is only half the contract; the data behind each column must satisfy the SRID and validity constraints the schema promises. Encode that as a dbt generic test so CI fails the build before a bad migration reaches production.

-- tests/generic/test_geometry_srid_consistency.sql
{% test geometry_srid_consistency(model, column_name, expected_srid) %}
  SELECT {{ column_name }}
  FROM {{ model }}
  WHERE {{ column_name }} IS NOT NULL
    AND (
      ST_SRID({{ column_name }}) != {{ expected_srid }}
      OR NOT ST_IsValid({{ column_name }})
    )
{% endtest %}

Attach it to the schema YAML so every run asserts both the coordinate system and topological validity of the column:

# models/marts/_marts.yml
models:
  - name: spatial_fact_table
    columns:
      - name: geom
        tests:
          - geometry_srid_consistency:
              expected_srid: 4326

A fast pre-merge sweep verifies the extension and catalog state before any DDL runs. Run it as a dbt run-operation in the CI job that precedes dbt build:

-- Verify PostGIS is present and the column is registered with the expected SRID
SELECT PostGIS_Version();

SELECT f_table_name, f_geometry_column, srid, type
FROM geometry_columns
WHERE f_table_name = 'spatial_fact_table';

Pairing a structural check (the geometry_columns catalog says SRID 4326) with a content check (ST_IsValid and ST_SRID over the rows) catches both classes of drift: a schema that lies about its data, and data that violates a correct schema.

Advanced Patterns: Index Lifecycle & Incremental Safety

Spatial indexes are expensive to rebuild and exquisitely sensitive to schema mutation. Dropping a geometry column or altering its type invalidates the GiST or R-Tree structure bound to it, and the planner only discovers the loss when the next spatial predicate degrades to a sequential scan. Decouple index management from the transformation itself and run it as its own post-hook, after the column sync has settled:

-- macros/manage_spatial_indexes.sql
{% macro manage_spatial_indexes(relation, column_name, index_name) %}
  {% set drop_sql = "DROP INDEX IF EXISTS " ~ index_name %}
  {% set create_sql %}
    CREATE INDEX {{ index_name }}
    ON {{ relation }}
    USING GIST ({{ column_name }})
  {% endset %}

  {% do run_query(drop_sql) %}
  {% do run_query(create_sql) %}
{% endmacro %}

On large tables, an unqualified CREATE INDEX takes an ACCESS EXCLUSIVE lock that blocks concurrent analytical reads for the duration of the build. In PostgreSQL, prefer CREATE INDEX CONCURRENTLY (note: it cannot run inside dbt’s wrapping transaction, so issue it from an on-run-end operation outside the model transaction) and schedule rebuilds during low-concurrency windows. At warehouse scale, the index-rebuild cost interacts with partitioning and clustering choices covered in handling large geospatial datasets.

Cross-environment schema tracking

Production pipelines need deterministic schema evolution across development, staging, and production. Relying on ad-hoc dbt run invocations invites drift, especially while geometry columns are still being refined. Wire three checks into the orchestration layer:

Pre-run validation: a lightweight dbt test selector that asserts SRID consistency and ST_IsValid before any incremental merge is permitted.
Environment parity checks: diff geometry_columns (or the warehouse equivalent) across environments to surface uncommitted spatial schema changes before they ship.
Audit trail generation: every spatial DDL operation logs the model name, column altered, SRID applied, index rebuilt, and execution timestamp to a centralized metadata table, keyed by deploy_environment.

The full macro set, diffing protocol, and fast-recovery rollback are detailed in tracking spatial schema changes across environments. Treat that audit table as the system of record: when scope geometries or classification tiers change under Data Security & Scoping Rules, the same schema-event log keeps the policy history reconstructable.

Troubleshooting

Symptom	Root cause	Fix
New geometry column reports SRID `0`	Column added by dbt’s native `sync_all_columns` without an SRID binding	Set `on_schema_change: append_new_columns` and let `spatial_sync_columns` create the column with `GEOMETRY(Geometry, <srid>)`
Points render in the wrong location after a migration	SRID re-stamped with `ST_SetSRID` instead of re-projected	Re-project in place with `ST_Transform` inside `ALTER COLUMN ... USING`, as in `reconcile_srid`
Spatial query degrades to a full-table scan post-deploy	GiST index silently dropped when the geometry column was altered	Run `manage_spatial_indexes` as a post-hook so the index is rebuilt after every schema change
`geometry_srid_consistency` test passes locally, fails in prod	Environments diverged; one carries an uncommitted SRID change	Add the environment parity check that diffs `geometry_columns` across environments before promotion
`CREATE INDEX CONCURRENTLY` errors inside dbt	The statement cannot run within dbt’s wrapping transaction	Move the concurrent build to an `on-run-end` operation outside the model transaction boundary

Capture, for every run, which columns changed, the SRID in force at the time, and the index rebuilt — that metadata is what lets a regulated platform roll back a faulty migration and prove why a downstream consumer saw a given geometry, turning spatial schema evolution from a silent failure surface into an auditable part of the DAG.

Spatial reference system management — enforce the canonical CRS that schema versioning assumes.
Tracking spatial schema changes across environments — the diffing protocol and rollback playbook for spatial drift.
Handling large geospatial datasets — keep index rebuilds affordable at warehouse scale.
Data Security & Scoping Rules — gate who can execute spatial DDL and audit classification changes.
Building custom spatial macros — generalize the sync and index macros into reusable, cross-engine patterns.

Up one level: Spatial Data Architecture & Governance

Explore this section