Tracking spatial schema changes across environments

In modern analytics stacks, geospatial transformations introduce unique schema volatility that standard column-type tracking routinely misses. When GEOMETRY or GEOGRAPHY columns shift SRIDs, alter coordinate precision, or silently drop spatial indexes between development, staging, and production, downstream GIS services, routing engines, and BI dashboards fail with cryptic errors or return silently corrupted spatial joins. Tracking spatial schema changes across environments requires a deterministic approach that bridges dbt’s state management with database-specific spatial metadata catalogs. This guide delivers exact macros, index-aware validation tests, and a fast-recovery protocol to eliminate spatial drift in dbt + Geospatial: Transforming Spatial Data in the Modern Stack workflows.

Why Standard Schema Tracking Fails for Geospatial Data

Traditional dbt schema tests (unique, not_null, accepted_values) operate on scalar types and ignore spatial topology. A change from GEOMETRY(Point, 4326) to GEOMETRY(Point, 3857) preserves the column name and base data type in information_schema.columns, but fundamentally breaks distance calculations, spatial indexing, and coordinate transformations. Furthermore, spatial indexes (GIST, BRIN, SP-GiST) and table partitioning strategies are rarely version-controlled in schema.yml. Without explicit tracking, teams encounter cascading failures in query planners, memory exhaustion during full-table rewrites, and untraceable drift that violates established Spatial Data Architecture & Governance baselines. The core issue is that relational metadata catalogs treat spatial objects as opaque binary blobs unless explicitly interrogated through spatial system views like the PostGIS geometry_columns catalog (PostGIS geometry_columns Reference).

Step 1: Extract Deterministic Spatial Metadata

To track changes accurately, you must query database-specific spatial catalogs alongside standard information schemas. The following dbt macro extracts SRID, geometry type, index presence, and storage metrics for any target relation. It is designed to run as a dbt run-operation and outputs a structured dictionary suitable for programmatic diffing.

sql
-- macros/get_spatial_metadata.sql
{% macro get_spatial_metadata(target_relation) %}
    {% set query %}
    WITH base_cols AS (
        SELECT
            column_name,
            data_type,
            udt_name,
            is_nullable
        FROM information_schema.columns
        WHERE table_schema = '{{ target_relation.schema }}'
          AND table_name = '{{ target_relation.identifier }}'
          AND udt_name IN ('geometry', 'geography')
    ),
    spatial_meta AS (
        SELECT
            f_geometry_column AS column_name,
            srid,
            type AS geom_type,
            coord_dimension
        FROM geometry_columns
        WHERE f_table_schema = '{{ target_relation.schema }}'
          AND f_table_name = '{{ target_relation.identifier }}'
    ),
    idx_meta AS (
        SELECT
            i.relname AS index_name,
            a.attname AS column_name,
            am.amname AS index_type,
            pg_get_indexdef(i.oid) AS index_ddl
        FROM pg_index x
        JOIN pg_class i ON i.oid = x.indexrelid
        JOIN pg_class t ON t.oid = x.indrelid
        JOIN pg_namespace n ON n.oid = t.relnamespace
        JOIN pg_attribute a ON a.attrelid = t.oid AND a.attnum = ANY(x.indkey)
        JOIN pg_am am ON i.relam = am.oid
        WHERE n.nspname = '{{ target_relation.schema }}'
          AND t.relname = '{{ target_relation.identifier }}'
          AND am.amname IN ('gist', 'spgist', 'brin')
    )
    SELECT
        bc.column_name,
        bc.data_type,
        sm.srid,
        sm.geom_type,
        sm.coord_dimension,
        json_agg(
            json_build_object(
                'index_name', im.index_name,
                'index_type', im.index_type,
                'index_ddl', im.index_ddl
            )
        ) FILTER (WHERE im.index_name IS NOT NULL) AS spatial_indexes
    FROM base_cols bc
    LEFT JOIN spatial_meta sm ON bc.column_name = sm.column_name
    LEFT JOIN idx_meta im ON bc.column_name = im.column_name
    GROUP BY bc.column_name, bc.data_type, sm.srid, sm.geom_type, sm.coord_dimension;
    {% endset %}

    {% set results = run_query(query) %}
    {% do return(results) %}
{% endmacro %}

Execute this via dbt run-operation get_spatial_metadata --args '{"target_relation": {"schema": "analytics", "identifier": "fact_store_locations"}}'. The output provides a deterministic snapshot of spatial topology, coordinate reference systems, and index configurations. For comprehensive implementation patterns, consult the Versioning Spatial Schemas in dbt cluster documentation.

Step 2: Implement Cross-Environment Diffing & Validation

Raw metadata extraction is only valuable when compared against a known baseline. In CI/CD pipelines, you should serialize the macro output to JSON, commit it to a spatial_metadata/ directory, and run a pre-merge validation that diffs the current environment against the production baseline. Leveraging dbt’s state comparison capabilities (dbt State Comparison) allows you to isolate exactly which models triggered spatial mutations.

Create a custom dbt test that queries the extracted metadata and asserts SRID consistency, geometry type stability, and index presence:

sql
-- tests/assert_spatial_schema_integrity.sql
{{ config(severity='error') }}

WITH current_state AS (
    SELECT
        column_name,
        srid,
        geom_type,
        COALESCE(json_array_length(spatial_indexes), 0) AS index_count
    FROM {{ ref('spatial_metadata_snapshot') }}
),
baseline_state AS (
    SELECT
        column_name,
        srid,
        geom_type,
        index_count
    FROM {{ ref('baseline_spatial_metadata') }}
)
SELECT
    c.column_name,
    c.srid AS current_srid,
    b.srid AS baseline_srid,
    c.geom_type AS current_type,
    b.geom_type AS baseline_type,
    c.index_count AS current_indexes,
    b.index_count AS baseline_indexes
FROM current_state c
JOIN baseline_state b ON c.column_name = b.column_name
WHERE c.srid != b.srid
   OR c.geom_type != b.geom_type
   OR c.index_count < b.index_count;

This test fails fast if a developer accidentally casts coordinates to a different projection or drops a GIST index during a dbt run --full-refresh. By anchoring validation to explicit metadata snapshots, you eliminate the guesswork that typically plagues spatial data scientists during environment promotions.

Step 3: Automate Recovery & Drift Prevention

Detection without automated remediation creates operational overhead. When schema drift is identified, your pipeline should trigger a recovery sequence:

  1. SRID Enforcement: Use ST_Transform() in a pre-hook to normalize incoming data to the canonical EPSG code before materialization.
  2. Index Recreation: Implement post-hook SQL that conditionally creates missing spatial indexes based on the baseline metadata.
  3. Audit Trail Integration: Log all schema mutations to a centralized metadata table, capturing the dbt invocation ID, environment, and diff summary. This aligns with broader audit trail requirements for spatial pipelines and ensures compliance with data security scoping rules.

Integrate these checks into your GitHub Actions or GitLab CI workflows. Run the metadata extraction macro against a staging clone before merging to main. If the diff exceeds acceptable thresholds, block the merge and require explicit approval from a GIS backend developer. This proactive stance prevents silent corruption from propagating to production BI dashboards and routing engines.

Conclusion

Tracking spatial schema changes across environments is not an optional enhancement; it is a foundational requirement for any analytics stack handling geospatial workloads. By combining deterministic metadata extraction, cross-environment diffing, and automated validation tests, teams can maintain strict spatial integrity across the development lifecycle. Implementing these practices ensures that coordinate systems remain consistent, spatial indexes persist, and downstream applications receive reliable, topologically sound data.