How is tracking schema changes different from versioning the schema?

Versioning controls how a column definition evolves inside one project. Tracking compares the realized schema between environments, catching cases where the same dbt code produces a different SRID or loses an index because dev, staging, and prod ran against different adapter versions or migration states.

Tracking spatial schema changes across environments

This page shows you how to capture a deterministic snapshot of every spatial column’s SRID, geometry type, and index configuration, then diff that snapshot across dev, staging, and production so projection or index drift fails CI instead of silently corrupting downstream maps and joins.

Geospatial transformations introduce schema volatility that standard column-type tracking routinely misses. When a GEOMETRY or GEOGRAPHY column shifts SRIDs, alters coordinate precision, or silently drops a spatial index between environments, downstream GIS services, routing engines, and BI dashboards fail with cryptic errors or return quietly corrupted spatial joins. This guide sits under versioning spatial schemas in dbt and supplies the cross-environment tracking layer: the metadata it watches lives outside the column’s base type, so you have to interrogate the database’s spatial catalogs directly to see it change.

When to use this approach

Reach for snapshot-based spatial schema tracking when:

You promote spatial models through more than one environment and need drift to fail a merge, not a dashboard. If your concern is evolving a single column’s definition safely within one project, the parent workflow on versioning spatial schemas in dbt owns the sync-macro mechanics; this page owns environment-to-environment comparison.
Your drift is specifically projection or index loss, not value-level changes. If the SRID itself is what keeps moving, pair this with automating CRS conversions in dbt pipelines, which enforces one canonical SRID at the source.
You run a lightweight engine in CI and a heavier one in production — for example validating in the DuckDB spatial extension before promoting to PostGIS — and need a single metadata contract that both environments must satisfy.

Prerequisites

dbt Core ≥ 1.7 (stable run-operation, --state for targeted reruns, --store-failures for test forensics).
PostGIS ≥ 3.3 behind dbt-postgres ≥ 1.7. The catalog queries below read PostGIS’s geometry_columns view and the pg_index / pg_am system catalogs; Snowflake and BigQuery expose equivalent metadata through INFORMATION_SCHEMA but lack a geometry_columns analogue.
Grants: SELECT on information_schema, the target schema, and the pg_catalog system views; CREATE on the schema that stores the committed baseline snapshot.
Environment variables through dbt’s env_var() — connection secrets plus the target name, so the snapshot is tagged with the environment it came from.
A target relation that already stamps each geometry with a real SRID via ST_SetSRID at the staging layer; see setting up PostGIS with dbt for the adapter baseline this assumes.

Why standard schema tracking misses spatial drift

Traditional dbt schema tests (unique, not_null, accepted_values) operate on scalar types and ignore spatial topology. A change from GEOMETRY(Point, 4326) to GEOMETRY(Point, 3857) preserves the column name and base data type in information_schema.columns, but fundamentally breaks distance calculations, spatial indexing, and coordinate transformations. Spatial indexes (GiST, BRIN, SP-GiST) and table partitioning strategies are rarely version-controlled in schema.yml either. Without explicit tracking, teams hit cascading failures in query planners, memory exhaustion during full-table rewrites, and untraceable drift that violates established Spatial Data Architecture & Governance baselines. The root cause is that relational metadata catalogs treat spatial objects as opaque binary blobs unless you interrogate the spatial system views directly (PostGIS geometry_columns reference).

Step-by-step instructions

Step 1: Extract deterministic spatial metadata

To track changes accurately, query the database-specific spatial catalogs alongside the standard information schema. This macro extracts SRID, geometry type, index presence, and storage metadata for any target relation. It runs as a dbt run-operation and returns a structured result suitable for programmatic diffing.

-- macros/get_spatial_metadata.sql
{% macro get_spatial_metadata(target_relation) %}
    {% set query %}
    WITH base_cols AS (
        SELECT
            column_name,
            data_type,
            udt_name,
            is_nullable
        FROM information_schema.columns
        WHERE table_schema = '{{ target_relation.schema }}'
          AND table_name = '{{ target_relation.identifier }}'
          AND udt_name IN ('geometry', 'geography')
    ),
    spatial_meta AS (
        SELECT
            f_geometry_column AS column_name,
            srid,
            type AS geom_type,
            coord_dimension
        FROM geometry_columns
        WHERE f_table_schema = '{{ target_relation.schema }}'
          AND f_table_name = '{{ target_relation.identifier }}'
    ),
    idx_meta AS (
        SELECT
            i.relname AS index_name,
            a.attname AS column_name,
            am.amname AS index_type,
            pg_get_indexdef(i.oid) AS index_ddl
        FROM pg_index x
        JOIN pg_class i ON i.oid = x.indexrelid
        JOIN pg_class t ON t.oid = x.indrelid
        JOIN pg_namespace n ON n.oid = t.relnamespace
        JOIN pg_attribute a ON a.attrelid = t.oid AND a.attnum = ANY(x.indkey)
        JOIN pg_am am ON i.relam = am.oid
        WHERE n.nspname = '{{ target_relation.schema }}'
          AND t.relname = '{{ target_relation.identifier }}'
          AND am.amname IN ('gist', 'spgist', 'brin')
    )
    SELECT
        bc.column_name,
        bc.data_type,
        sm.srid,
        sm.geom_type,
        sm.coord_dimension,
        json_agg(
            json_build_object(
                'index_name', im.index_name,
                'index_type', im.index_type,
                'index_ddl', im.index_ddl
            )
        ) FILTER (WHERE im.index_name IS NOT NULL) AS spatial_indexes
    FROM base_cols bc
    LEFT JOIN spatial_meta sm ON bc.column_name = sm.column_name
    LEFT JOIN idx_meta im ON bc.column_name = im.column_name
    GROUP BY bc.column_name, bc.data_type, sm.srid, sm.geom_type, sm.coord_dimension;
    {% endset %}

    {% set results = run_query(query) %}
    {% do return(results) %}
{% endmacro %}

Run it against a relation and confirm it returns one row per spatial column with a non-null SRID:

dbt run-operation get_spatial_metadata \
  --args '{"target_relation": {"schema": "analytics", "identifier": "fact_store_locations"}}'
# expect: column_name | data_type | srid | geom_type | coord_dimension | spatial_indexes

Step 2: Serialize a per-environment snapshot

Metadata is only useful when it is captured deterministically and tagged with the environment it came from. Materialize the macro output as a model so each dbt run writes a fresh snapshot keyed by target.name, and so the production run can be promoted to the committed baseline.

-- models/governance/spatial_metadata_snapshot.sql
{{ config(materialized='table', tags=['spatial_governance']) }}

SELECT
    '{{ target.name }}'::text AS environment,
    column_name,
    srid,
    geom_type,
    coord_dimension,
    COALESCE(json_array_length(spatial_indexes), 0) AS index_count,
    spatial_indexes
FROM {{ ref('stg_store_locations') }}_spatial_meta  -- emitted by the macro via a seed/operation

Build it and export the snapshot to a version-controlled directory so the diff has something to compare against:

dbt run --select spatial_metadata_snapshot --target prod
dbt run-operation get_spatial_metadata \
  --args '{"target_relation": {"schema": "analytics", "identifier": "fact_store_locations"}}' \
  > spatial_metadata/baseline_fact_store_locations.json

Step 3: Diff the current environment against the baseline

A custom singular test queries the snapshot and asserts SRID consistency, geometry-type stability, and index presence against the committed baseline. It returns rows only on drift, so dbt blocks the run when something moved. dbt’s state comparison lets you scope this to exactly the models that changed.

-- tests/assert_spatial_schema_integrity.sql
{{ config(severity='error', store_failures=true) }}

WITH current_state AS (
    SELECT
        column_name,
        srid,
        geom_type,
        COALESCE(json_array_length(spatial_indexes), 0) AS index_count
    FROM {{ ref('spatial_metadata_snapshot') }}
),
baseline_state AS (
    SELECT
        column_name,
        srid,
        geom_type,
        index_count
    FROM {{ ref('baseline_spatial_metadata') }}
)
SELECT
    c.column_name,
    c.srid       AS current_srid,
    b.srid       AS baseline_srid,
    c.geom_type  AS current_type,
    b.geom_type  AS baseline_type,
    c.index_count AS current_indexes,
    b.index_count AS baseline_indexes
FROM current_state c
JOIN baseline_state b ON c.column_name = b.column_name
WHERE c.srid != b.srid
   OR c.geom_type != b.geom_type
   OR c.index_count < b.index_count;

Run it in CI; the build fails fast if a developer reprojects coordinates or drops a GiST index during a dbt run --full-refresh:

dbt build --select assert_spatial_schema_integrity
# inspect dbt's store-failures table to see exactly which column drifted

Step 4: Automate recovery and prevent re-drift

Detection without remediation just creates operational toil. When the test fails, your pipeline should trigger a recovery sequence rather than waiting on a human to hand-patch production:

SRID enforcement. Use ST_Transform in a pre-hook to normalize incoming data to the canonical EPSG code before materialization — the registry pattern in automating CRS conversions in dbt pipelines supplies the canonical target.
Index recreation. Add a post-hook that conditionally rebuilds any spatial index present in the baseline but missing from the current relation, then runs ANALYZE so the planner picks it up.
Audit trail integration. Log every detected mutation to a centralized governance table, capturing the dbt invocation ID, environment, and diff summary. This aligns with the access-scoping requirements in data security scoping rules.

-- in the model config: rebuild a missing GiST index defensively
{{ config(
    post_hook=[
      "CREATE INDEX IF NOT EXISTS {{ this.identifier }}_geom_gist
         ON {{ this }} USING gist (geom)",
      "ANALYZE {{ this }}"
    ]
) }}

Wire the metadata extraction and diff test into GitHub Actions or GitLab CI: run them against a staging clone before merging to main, and if the diff exceeds your threshold, block the merge and require explicit approval. This keeps silent corruption from propagating to production dashboards and routing engines.

Configuration reference

Parameter	Accepted values	Default	Spatial notes
`target_relation.schema`	any schema name	—	Must match the `f_table_schema` PostGIS recorded; a view over a geometry column will not appear in `geometry_columns`
`target_relation.identifier`	any table name	—	Pass the physical relation, not an alias; the catalog join is on `relname`
`am.amname` filter	`gist`, `spgist`, `brin`	all three	Add `hnsw` only if you track approximate-nearest-neighbour vector indexes alongside geometry
`severity` (test)	`error`, `warn`	`error`	Use `warn` in dev to surface drift without blocking; keep `error` in CI and on promotion
`store_failures`	`true`, `false`	`true`	Persists drifted rows to a table so you can diff which column and which environment moved
`index_count` comparison	`<` vs `!=`	`<`	`<` only fails on lost indexes; switch to `!=` if added indexes must also be reviewed

Gotchas and edge cases

Views and materialized views are invisible to geometry_columns. PostGIS registers only base-table geometry columns, so a snapshot of a view returns a null SRID. Snapshot the underlying table, or register the view geometry explicitly with ST_SetSRID typing.
coord_dimension drift hides under a matching SRID. A column can keep SRID 4326 while silently dropping from XYZ to XY. Add coord_dimension to the test’s WHERE clause if Z values matter downstream.
json_array_length(NULL) is NULL, not 0. The COALESCE(..., 0) wrapper is load-bearing — without it a table with zero spatial indexes compares as NULL and the inequality never trips.
Adapter divergence. Snowflake and BigQuery have no geometry_columns view; port Step 1 to their INFORMATION_SCHEMA and skip the pg_index block, since their search-optimized indexes are not user-managed the way GiST is.
Index name churn. CREATE INDEX without an explicit name produces a generated identifier that differs per environment. Compare on index_type and index_count, not index_name, or pin names so the baseline stays stable.

Frequently asked questions

Why does the snapshot show a NULL SRID for a column that clearly has geometries?

The column is almost certainly tagged SRID 0 (unknown) or exposed through a view that PostGIS never registered in geometry_columns. Confirm staging applied ST_SetSRID with the real source projection, and snapshot the base table rather than a view or CTE-backed model.

How is this different from versioning the schema in the first place?

Versioning controls how a column’s definition evolves inside one project — see versioning spatial schemas in dbt. Tracking compares the realized schema between environments, catching the case where the same dbt code produces a different SRID or loses an index because dev, staging, and prod ran against different adapter versions or migration states.

Should the baseline snapshot be the production schema or a hand-written contract?

Promote the production snapshot to the baseline after a reviewed release, so the contract reflects what consumers actually depend on. A hand-written contract drifts from reality; an auto-promoted one stays accurate as long as you gate the promotion behind the same diff test.

Can I catch a dropped GiST index before it tanks query performance?

Yes — that is what the index_count comparison is for. The test fails when the current relation has fewer spatial indexes than the baseline, so a --full-refresh that forgot to recreate the index blocks CI instead of surfacing as a slow dashboard in production.

Versioning spatial schemas in dbt — the parent workflow for SRID-aware sync macros, validation tests, and index lifecycle hooks.
Automating CRS conversions in dbt pipelines — enforce one canonical SRID at the source so projection drift never starts.
Spatial model dependency graphs — scope drift checks to exactly the models a change touches.
Data security scoping rules — where the mutation audit trail and access controls live.

Up: Versioning Spatial Schemas in dbt