Guide · 8 min read

Postgres PII detection (schema-side, no row data)

You don't need to read row data to find PII and PHI exposure in Postgres. Column names, types, role grants, and RLS policies are enough to triage 90%+ of real risk — and that's the evidence shape auditors actually want. Here's the seven-step approach KollGuard's scanner uses.

PostgresSupabasePIIPHIpgAuditRLS

The seven-step schema audit

  1. 1.Use information_schema, never sample rows

    Postgres ships everything you need in information_schema and pg_catalog. Walk every table in public + your app schemas; for each column, capture name, data type, nullability, and grants. Do NOT SELECT row data — that requires permissions you may not have AND violates the policies you're trying to enforce.

  2. 2.Heuristic-tag columns by name + type

    Build a small ruleset over column metadata. Names like email, phone, ssn, dob, mrn, address, first_name, last_name, ip_address are PII shapes. PHI-specific: diagnosis_*, encounter_*, npi, icd_*, cpt_*, lab_*, prescription_*. Types like inet, citext, and text columns named like the above flag PII even without row inspection.

  3. 3.Cross-check anon-role reachability

    On Supabase (and any PostgREST stack), the public schema is exposed to the anon role. A flagged PII column AND no RLS = publicly readable PII. Query information_schema.role_table_grants and pg_policies to detect this combination; it's the highest-severity finding.

  4. 4.Encrypt at-rest selectively

    Don't encrypt every column — find the ones that actually need it (HIPAA §164.312(a)(2)(iv) ePHI, PCI DSS 3.4 PAN). pgcrypto with a KMS-managed key + a documented rotation cadence is the standard answer. Note: encrypting an indexed column kills the index — design for it.

  5. 5.Constrain the role behind your app

    Don't connect as service_role from your app server. Create a dedicated app role with grants only on the tables and columns it strictly needs. Auditors will pull pg_roles + grants and check. KollGuard's scanner flags over-privileged roles automatically.

  6. 6.Log reads with pgAudit

    Enable pgAudit's READ class for ePHI/PII tables, ship logs to a tamper-evident store, retain for the longer of 6 years (HIPAA) or your industry's minimum. Without this, you can't answer the §164.312(b) audit question.

  7. 7.Disposition the exceptions

    Some PII must be reachable — login email, customer profile. Document why ('required for authentication'), the mitigating controls (rate-limited, RLS-restricted to the row owner), and who approved it. Auditors accept dispositions with rationale; they reject silence.

Quick framework mapping

FrameworkCitationWhat it expects
HIPAA Security Rule§164.312(a)(2)(iv)Encryption at rest for ePHI
HIPAA Security Rule§164.312(b)Audit log of every read of ePHI
PCI DSS 43.4Render PAN unreadable when stored
SOC 2CC6.1Logical access controls (RLS + grants)
GDPRArt. 32Pseudonymization & encryption

Frequently asked

What is the difference between PII and PHI?
PII = personally identifiable information (broader). PHI = protected health information under HIPAA = any health information combined with one of 18 identifier types. All PHI is PII; not all PII is PHI. State breach laws use the broader PII definition; HIPAA uses PHI.
Do I need to scan production data or schema only?
Schema-level scanning catches 90%+ of real exposure (column shapes, role grants, RLS gaps) without ever touching customer data. KollGuard's Postgres scanner stays at the schema level on purpose. Row-level scanning is only for data-loss prevention products with separate justification.
Will KollGuard read my row data?
No. We connect with a read-only role, query system catalogs, and never SELECT from your tables. The scan is fully observable in pg_stat_activity if you want to verify.
How does this compare to Strac, Cyera, or BigID?
Those are enterprise data-discovery platforms ($50k+/yr) designed to scan and classify row content across many data stores. KollGuard's PII detection is schema-level only and bundled in a $19.89/mo SOC 2/HIPAA scanner. For a small team with one Postgres database, KollGuard is overkill-free; for a large org with 50+ data stores, you want the enterprise tools.
Is pseudonymization enough for HIPAA Safe Harbor?
No. Safe Harbor is the specific HIPAA de-identification standard at 45 CFR §164.514(b) — requires removal of 18 identifier types AND no actual knowledge that re-identification is possible. Pseudonymization (token replacement) is reversible and therefore typically not Safe Harbor; it's a mitigation but not de-identification.

Run the audit on your DB

First scan is free. Read-only connection. Zero row data leaves your database.