The seven-step schema audit
1.Use information_schema, never sample rows
Postgres ships everything you need in information_schema and pg_catalog. Walk every table in public + your app schemas; for each column, capture name, data type, nullability, and grants. Do NOT SELECT row data — that requires permissions you may not have AND violates the policies you're trying to enforce.
2.Heuristic-tag columns by name + type
Build a small ruleset over column metadata. Names like email, phone, ssn, dob, mrn, address, first_name, last_name, ip_address are PII shapes. PHI-specific: diagnosis_*, encounter_*, npi, icd_*, cpt_*, lab_*, prescription_*. Types like inet, citext, and text columns named like the above flag PII even without row inspection.
3.Cross-check anon-role reachability
On Supabase (and any PostgREST stack), the public schema is exposed to the anon role. A flagged PII column AND no RLS = publicly readable PII. Query information_schema.role_table_grants and pg_policies to detect this combination; it's the highest-severity finding.
4.Encrypt at-rest selectively
Don't encrypt every column — find the ones that actually need it (HIPAA §164.312(a)(2)(iv) ePHI, PCI DSS 3.4 PAN). pgcrypto with a KMS-managed key + a documented rotation cadence is the standard answer. Note: encrypting an indexed column kills the index — design for it.
5.Constrain the role behind your app
Don't connect as service_role from your app server. Create a dedicated app role with grants only on the tables and columns it strictly needs. Auditors will pull pg_roles + grants and check. KollGuard's scanner flags over-privileged roles automatically.
6.Log reads with pgAudit
Enable pgAudit's READ class for ePHI/PII tables, ship logs to a tamper-evident store, retain for the longer of 6 years (HIPAA) or your industry's minimum. Without this, you can't answer the §164.312(b) audit question.
7.Disposition the exceptions
Some PII must be reachable — login email, customer profile. Document why ('required for authentication'), the mitigating controls (rate-limited, RLS-restricted to the row owner), and who approved it. Auditors accept dispositions with rationale; they reject silence.
Quick framework mapping
| Framework | Citation | What it expects |
|---|---|---|
| HIPAA Security Rule | §164.312(a)(2)(iv) | Encryption at rest for ePHI |
| HIPAA Security Rule | §164.312(b) | Audit log of every read of ePHI |
| PCI DSS 4 | 3.4 | Render PAN unreadable when stored |
| SOC 2 | CC6.1 | Logical access controls (RLS + grants) |
| GDPR | Art. 32 | Pseudonymization & encryption |
Frequently asked
- What is the difference between PII and PHI?
- PII = personally identifiable information (broader). PHI = protected health information under HIPAA = any health information combined with one of 18 identifier types. All PHI is PII; not all PII is PHI. State breach laws use the broader PII definition; HIPAA uses PHI.
- Do I need to scan production data or schema only?
- Schema-level scanning catches 90%+ of real exposure (column shapes, role grants, RLS gaps) without ever touching customer data. KollGuard's Postgres scanner stays at the schema level on purpose. Row-level scanning is only for data-loss prevention products with separate justification.
- Will KollGuard read my row data?
- No. We connect with a read-only role, query system catalogs, and never SELECT from your tables. The scan is fully observable in pg_stat_activity if you want to verify.
- How does this compare to Strac, Cyera, or BigID?
- Those are enterprise data-discovery platforms ($50k+/yr) designed to scan and classify row content across many data stores. KollGuard's PII detection is schema-level only and bundled in a $19.89/mo SOC 2/HIPAA scanner. For a small team with one Postgres database, KollGuard is overkill-free; for a large org with 50+ data stores, you want the enterprise tools.
- Is pseudonymization enough for HIPAA Safe Harbor?
- No. Safe Harbor is the specific HIPAA de-identification standard at 45 CFR §164.514(b) — requires removal of 18 identifier types AND no actual knowledge that re-identification is possible. Pseudonymization (token replacement) is reversible and therefore typically not Safe Harbor; it's a mitigation but not de-identification.
