Guide · 6 min read

Scan Postgres for PII and PHI exposure

You don't need to read row data to find PII risk. Column names, types, RLS, and role grants are enough to triage 90% of real exposure in a Postgres or Supabase database — and that's exactly what auditors expect to see in your evidence. Here's the schema-level approach.

PostgresSupabasePIIPHIRLS

The six-step audit

  1. 1.Inventory columns by likely PII / PHI shape

    Walk every public-schema table and flag columns whose name OR type matches a PII shape: email, phone, ssn, dob, mrn, address, name, ip. Add type heuristics for PHI: diagnosis_*, encounter_*, npi, icd_*, cpt_*. Don't sample row data unless you're allowed to — name + type is enough to triage.

  2. 2.Check which of those columns are reachable by the anon role

    On Supabase, PostgREST exposes the public schema to anon. A PII column reachable by the anon key is a public PII column. Verify by simulating an anonymous request, or check whether the table has RLS enabled AND a policy that restricts SELECT.

  3. 3.Mask or encrypt at rest where the audit requires it

    HIPAA §164.312(a)(2)(iv) and PCI DSS 3.4 both expect at-rest encryption for sensitive columns. Postgres pgcrypto + a KMS-managed key, or column-level encryption via pg_tle, satisfy this. Document the key rotation cadence.

  4. 4.Restrict the role that backs your app server

    Don't connect to Postgres as service_role from your app. Create a dedicated app role with only the grants it needs; revoke SELECT on PII columns from any role that doesn't strictly require them. Auditors will pull pg_roles + grants and check.

  5. 5.Audit every read of PII columns

    pgAudit's READ class logs SELECTs on specified objects. Enable it for tables holding PII/PHI, ship the logs to a tamper-evident store, and retain 6 years for HIPAA (or your industry's minimum).

  6. 6.Disposition exceptions with a justification

    Some PII columns must be readable (e.g. an email login flow). Document why — "required for authentication" — alongside the mitigating control (rate-limited, RLS-restricted to the row owner). Auditors accept dispositions when the rationale is written down; what they reject is silence.

Compliance mapping

FrameworkCitationWhat it expects
HIPAA Security Rule§164.312(a)(2)(iv)Encryption at rest for ePHI columns
HIPAA Security Rule§164.312(b)Audit log of every read of ePHI
PCI DSS 43.4Render PAN unreadable when stored
SOC 2CC6.1Logical access controls (RLS + grants)
GDPRArt. 32Pseudonymization & encryption of personal data

Frequently asked

What counts as PII in a Postgres column?
PII in the US sense is information that can identify a person on its own (SSN, full name + DOB, MRN, email tied to a real person) or in combination (zip + age + gender, the Sweeney study showed). HIPAA's PHI is a tighter standard: any health information tied to an identifier. Most state breach notification laws care about the broader PII category.
Should I scan production data or schema only?
Schema-only first — column names + types are almost always enough to find the real exposure. Scanning row contents requires access controls that often violate the very policies you're trying to enforce. KollGuard's scanner stays at the schema/grants level on purpose.
Will KollGuard touch my row data?
No. The Postgres scanner connects with a read-only role and queries only system catalogs (information_schema, pg_catalog) and metadata views. Row contents of your tables are never read.
Does pseudonymization satisfy HIPAA Safe Harbor?
Pseudonymization (replacing identifiers with tokens) is a mitigation, not Safe Harbor. HIPAA Safe Harbor is a specific de-identification standard (45 CFR §164.514(b)) requiring removal of 18 identifier types AND no actual knowledge that re-identification is possible. Pseudonymization is reversible and therefore typically not Safe Harbor.

Run the scan on your DB

KollGuard's PII / PHI scanner reads only system catalogs — never your row data — and produces a report mapped to HIPAA, PCI DSS, SOC 2, and GDPR citations. First scan free.