Data Security & Scoping Rules

Modern geospatial pipelines demand more than coordinate transformations and spatial joins. As organizations scale location intelligence, the rules that decide who can see which geometries become a control layer in their own right — one that dictates which spatial entities a query returns, under what jurisdictional constraints, and through which analytical interfaces. The failure mode is subtle: a single misaligned projection or an un-scoped intermediate model leaks restricted coordinates into a BI dashboard, and nobody notices until an audit. Treating access control as a database grant applied after transformation is too late, because by then the sensitive geometry has already flowed through dozens of unsecured models.

This guide shows how to push spatial scoping into the transformation graph itself, where it can be version-controlled, parameterized, and tested alongside your SQL. It belongs to the broader Spatial Data Architecture & Governance practice, which treats geometry, coordinate systems, and lineage as first-class warehouse citizens. Here the focus narrows to one question: how do you filter, mask, and restrict customer footprints, critical-infrastructure points, and sensor traces according to compliance zones, user roles, and classification tiers — without watering down the analytical value of the data?

Prerequisites

Before encoding scoping rules, confirm the platform baseline. The patterns below assume a PostGIS-backed warehouse but call out the cloud-native equivalents where they diverge.

dbt Core >= 1.6 (or dbt Cloud on the equivalent runtime) so that var() defaults and project-level vars behave predictably across environments.
Adapter / extension: dbt-postgres >= 1.6 with PostGIS >= 3.1 (for stable ST_Intersects GiST cost estimates), or dbt-bigquery >= 1.6 for GEOGRAPHY-typed scoping. DuckDB users need the spatial extension — see the DuckDB spatial extension integration guide for CI parity.
Database permissions: the dbt service role needs SELECT on the scope reference tables and CREATE/USAGE on the target schema. If you also enforce warehouse-side policies (covered later), the role that creates the policy must be distinct from the role that queries through it.
Environment variables: never inline region codes or tenant identifiers. Resolve them through env_var() (for connection-level context) and var() (for run-level scope selection) so the same models deploy unchanged from staging to production.

A canonical coordinate reference system must already be enforced at staging. If your raw feeds arrive in mixed projections, resolve that first via spatial reference system management — scoping logic built on un-normalized geometries is unsound from the first commit.

Architecture Context

Scoping is not a single model; it is a thin predicate woven through every layer of the spatial graph. Raw geometry lands untrusted, staging normalizes its CRS and validity, and only then does the scope predicate gate what reaches the intermediate and mart layers. The reference geometries that define the boundaries — jurisdictions, service territories, compliance buffers — travel as their own clustered, indexed tables so the access join never triggers a full scan.

The position of the gate matters. Place it too late — in the mart — and every intermediate model holds unrestricted geometry that any analyst with table access can read. Place the canonical-CRS normalization too late and the predicate compares geometries in mismatched projections, silently passing points that should be excluded. This page sits between the engine-level setup in setting up PostGIS with dbt and the warehouse-side enforcement in implementing row-level security for geospatial data.

Configuration Walkthrough

Scope selection is environment-driven, so it lives in dbt_project.yml as a default var and in connection context via env_var(). Define a single source of truth for the active region and the canonical SRID, then let macros read from it.

# dbt_project.yml
name: spatial_platform
version: "1.0.0"
profile: spatial_platform

vars:
  # Default scope; overridden per-run with --vars '{active_region: EU}'
  active_region: "GLOBAL"
  canonical_srid: 4326

models:
  spatial_platform:
    staging:
      +materialized: view
    intermediate:
      +materialized: ephemeral
    marts:
      +materialized: table

Bind the database role and any tenant context to environment variables rather than the profile so secrets stay out of version control:

# profiles.yml
spatial_platform:
  target: prod
  outputs:
    prod:
      type: postgres
      host: "{{ env_var('DBT_PG_HOST') }}"
      user: "{{ env_var('DBT_PG_USER') }}"
      password: "{{ env_var('DBT_PG_PASSWORD') }}"
      dbname: "{{ env_var('DBT_PG_DBNAME') }}"
      schema: "{{ env_var('DBT_PG_SCHEMA', 'analytics') }}"
      threads: 4

Finally, guarantee the spatial extension and index prerequisites exist at the start of every invocation with an on-run-start hook, so a fresh environment cannot silently fall back to sequential scans:

# dbt_project.yml (continued)
on-run-start:
  - "CREATE EXTENSION IF NOT EXISTS postgis"

Core Implementation

Hardcoding ST_Intersects, ST_Within, or ST_DWithin across dozens of models creates maintenance debt and invites policy drift — the day someone forgets the predicate in one model is the day the boundary leaks. Encapsulate the scoping logic once, in a parameterized macro, and call it everywhere. This keeps the predicate DRY and environment-aware, and is the same abstraction discipline described in building custom spatial macros.

-- macros/apply_spatial_scope.sql
{% macro apply_spatial_scope(source_relation, geom_col, scope_relation, scope_geom_col, region_var='active_region') %}
  {% set active_region = var(region_var, 'GLOBAL') %}
  {% set srid = var('canonical_srid', 4326) %}

  {% if active_region == 'GLOBAL' %}
    SELECT * FROM {{ source_relation }}
  {% else %}
    SELECT s.*
    FROM {{ source_relation }} AS s
    INNER JOIN {{ scope_relation }} AS r
      ON r.region_code = {{ dbt.string_literal(active_region) }}
      -- Bounding-box pre-filter (&&) uses the GiST index before the exact test
      AND s.{{ geom_col }} && r.{{ scope_geom_col }}
      AND ST_Intersects(
        ST_SetSRID(s.{{ geom_col }}, {{ srid }}),
        ST_SetSRID(r.{{ scope_geom_col }}, {{ srid }})
      )
  {% endif %}
{% endmacro %}

Two details make this production-safe. First, the && bounding-box operator runs before ST_Intersects, so the GiST index prunes candidates and the expensive topological test only evaluates the survivors. Second, dbt.string_literal() quotes the region value rather than string-concatenating it into the SQL — region codes ultimately derive from request context, and naive interpolation is an injection vector.

A consuming model stays readable; the entire access decision is one macro call:

-- models/marts/customer_locations_scoped.sql
{{ config(materialized='table') }}

WITH scoped AS (
  {{ apply_spatial_scope(
        source_relation=ref('stg_customer_locations'),
        geom_col='geom',
        scope_relation=ref('scope_regions'),
        scope_geom_col='boundary_geom'
  ) }}
)

SELECT
    customer_id,
    -- Mask high-precision coordinates outside production
    {% if target.name != 'prod' %}
    ST_SnapToGrid(geom, 0.01) AS geom,
    {% else %}
    geom,
    {% endif %}
    region_code
FROM scoped

The scope reference table itself must be a real, indexed relation — not a CTE rebuilt on every run. Materialize it as a table and declare the spatial index so the bounding-box pre-filter has something to hit:

-- models/intermediate/scope_regions.sql
{{ config(
    materialized='table',
    post_hook="CREATE INDEX IF NOT EXISTS idx_scope_regions_geom
               ON {{ this }} USING GIST (boundary_geom)"
) }}

SELECT
    region_code,
    ST_MakeValid(ST_Transform(raw_boundary, {{ var('canonical_srid', 4326) }})) AS boundary_geom
FROM {{ source('reference', 'compliance_zones') }}

For BigQuery, drop the explicit index (it maintains spatial clustering automatically) and swap ST_Transform for the GEOGRAPHY constructors; the macro contract — same arguments, same return shape — stays identical, which is what makes the pattern portable across the adapters compared in choosing the right spatial adapter.

Validation & Testing

Security logic that isn’t tested is a hope, not a control. Verify the extension and index prerequisites first, then assert the behavior of the scope predicate with dbt tests.

Confirm the engine and that the scope index is actually used:

-- Engine + extension version
SELECT PostGIS_Version();

-- Prove the GiST index serves the access join (look for "Index Scan")
EXPLAIN ANALYZE
SELECT s.*
FROM stg_customer_locations s
JOIN scope_regions r
  ON s.geom && r.boundary_geom
 AND ST_Intersects(s.geom, r.boundary_geom)
WHERE r.region_code = 'EU';

Then encode the policy as assertions. A custom singular test catches the worst failure — a row that escaped its boundary — while generic tests guard SRID and null geometries:

# models/marts/_marts.yml
version: 2

models:
  - name: customer_locations_scoped
    columns:
      - name: geom
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "ST_SRID(geom) = 4326"
      - name: region_code
        tests:
          - accepted_values:
              values: ["EU", "NA", "APAC"]

-- tests/assert_no_geometry_escapes_scope.sql
-- Returns rows ONLY when a scoped geometry falls outside its declared region.
SELECT m.customer_id
FROM {{ ref('customer_locations_scoped') }} m
JOIN {{ ref('scope_regions') }} r
  ON m.region_code = r.region_code
WHERE NOT ST_Intersects(m.geom, r.boundary_geom)

Run these in CI on every pull request. A failing assert_no_geometry_escapes_scope should block the merge — it is the difference between catching a leak in review and explaining it to a regulator. Pair the assertions with row-count expectations per region so a predicate that silently matches everything (the second-worst failure) is also caught.

Advanced Patterns

Incremental scope evaluation. For high-volume point feeds, re-scanning the full history every run is wasteful. Configure the scoped model as incremental and only evaluate the predicate on new rows, while keeping the scope reference table small and fully materialized so the join stays index-bound. This is the same scaling discipline detailed in handling large geospatial datasets.

{{ config(materialized='incremental', unique_key='event_id') }}

{{ apply_spatial_scope(
      source_relation=ref('stg_events'),
      geom_col='geom',
      scope_relation=ref('scope_regions'),
      scope_geom_col='boundary_geom'
) }}

{% if is_incremental() %}
  WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}

Multi-axis scoping. Real policies rarely reduce to one polygon test. Combine attribute scope (tenant, organizational unit), topological scope (containment), and proximity scope (ST_DWithin buffers around sensitive infrastructure) into a single macro with optional arguments, so callers compose exactly the axes they need without forking the SQL.

Defense in depth. dbt-time filtering protects everything that flows through the transformation graph, but direct SQL against the warehouse bypasses it entirely. Layer database-native row-level security underneath so even ad-hoc queries respect the boundary — the full pattern lives in implementing row-level security for geospatial data. When the scope reference geometries or column classifications change, treat that as a schema event and record it through versioning spatial schemas in dbt so the policy history stays auditable.

Troubleshooting

Symptom	Root cause	Fix
Points outside the zone pass the scope join	Geometries compared in mismatched SRIDs; datum/projection shift moves the boundary	Normalize CRS at staging with `ST_Transform`/`ST_SetSRID`; assert `ST_SRID(geom) = canonical_srid` in tests
Scoping query degrades to full-table scan	`&&` pre-filter missing or `ST_Transform` applied inside the join, defeating the GiST index	Pre-compute the index-bound bounding-box test; push projection normalization upstream to staging
Scoped model returns zero rows after deploy	`active_region` var resolved to a code absent from `scope_regions`	Add an `accepted_values` test on `region_code`; default the var and fail loudly on unknown regions
Predicate silently matches every row	Scope join falls back to `GLOBAL` branch, or `region_code` filter dropped	Add per-region row-count expectations; log the resolved `active_region` to the audit trail
`ST_Intersects` errors on some boundaries	Invalid scope polygons (self-intersections, unclosed rings)	Wrap reference geometries in `ST_MakeValid()` before indexing; guard with an `ST_IsValid` sweep test

Capture, for every run, which predicate was applied, the canonical SRID in force at evaluation time, and the resolved scope variables. That metadata is what lets a regulated platform reconstruct why a given consumer saw a given geometry — turning access control from a black box into an auditable, reproducible part of the spatial DAG.

Spatial reference system management — enforce the canonical CRS that scoping predicates depend on.
Handling large geospatial datasets — keep the access join index-bound at warehouse scale.
Implementing row-level security for geospatial data — add warehouse-native enforcement beneath the dbt layer.
Versioning spatial schemas in dbt — audit changes to scope geometries and classification tiers.
Building custom spatial macros — generalize the scope macro into reusable, cross-engine UDF patterns.

Up one level: Spatial Data Architecture & Governance

Explore this section