Data handling

Data classes

Unphish v2 handles five distinct data classes, each with different controls:

Class	Examples	Storage	Transit	Retention
Identity	Authentik subjects, app user profiles, sessions	App DB (`users`, `user_identities`, `user_sessions`)	TLS	While account is active
Tenancy	Organizations, clients, brands, memberships, policies	App DB	TLS	Customer lifetime
Case content	URLs, screenshots, DNS, WHOIS, SSL, redirects, HTML, notes	App DB + S3	TLS	Per client policy (default: indefinite for closed cases)
Audit	Auth events, configuration changes, sensitive actions, impersonation	App DB (`audit_events`, append-only)	TLS	Per highest-tier client contract
Secrets	OIDC client secrets, provider API keys, Authentik admin token	Vercel / Render env (NOT app DB)	TLS	Until rotated

Evidence files

Screenshots, attachments, generated reports, and provider response artifacts are binary objects. They live in S3 (or Vercel Blob), not in the database.

The application stores a relative path for each file.
At read time, the app resolves the path to a presigned URL or a CDN URL.
Files are addressed by content path; nothing identifies them as belonging to a specific tenant in the URL itself, but access goes through the app, which enforces tenant scope.
Migrated v1 files live under a v1/ prefix or are resolved via V1_MEDIA_BASE_URL for an existing CDN.
File MIME types are validated on upload. Executable types are rejected. Image types are sanitized.
Maximum file sizes are enforced per upload type.

Direct bucket exposure to customers is not allowed. Every download goes through an authenticated app endpoint that checks scope before issuing the presigned URL.

Provider secrets

The single most important rule: provider secrets are never stored in the app database.

The Hub's /hub/secrets page displays metadata only:
- Provider name.
- Key name (e.g., URLSCAN_API_KEY).
- Target environment (preview / staging / production).
- Masked fingerprint (e.g., the first/last few characters or a SHA-256 prefix).
- Last-check timestamp and result.
- Audit history of metadata changes.
The actual secret value lives in Vercel environment variables (for app-runtime providers) or Render environment groups (for worker-runtime providers), scoped to the appropriate environment.
The provider_secret_metadata table holds rows that map "what secret should exist where" but never the secret itself.

Why this matters:

A SQL injection on the app database does not yield secrets.
A logging accident does not exfiltrate secrets — they are not in the request lifecycle as values, only as fingerprints.
Rotation is a Vercel/Render operation plus a redeploy. The Hub re-checks the new fingerprint after deployment.

PII handling

Data with personal-information sensitivity:

User contact details (email, name, image, phone where supplied) — stored in users for staff/customer/client users.
Client contact details (rights owner contact, legal contact, postal address, proof-of-authorization documents) — stored in clients and related tables.
WHOIS / RDAP registrant data — captured into evidence where legal jurisdiction permits. Where local law restricts redistribution, only registrar-level metadata is preserved.
Email evidence (headers, attachments) — treated as case content; same controls as other evidence.

PII is not logged in application logs. Provider responses that may contain PII are stored in object storage, not in log streams. Audit event entries reference subjects by ID, not by content.

Audit events

Every sensitive action writes an entry to the audit log:

Authentication events. Sign-in, sign-out, MFA enrollment, failed sign-in (rate-limited).
Membership changes. Invite, accept, resend, revoke, role edit, member removal.
Provider secret writes. Metadata only; the secret value is never logged.
Impersonation events. Start (with reason and target), stop, action-while-impersonating.
Case lifecycle transitions. Status / activity changes, assignment, closure, reopening.
Enforcement submissions and provider responses. Submission, status update, escalation.
Client approval decisions. Approve, reject, request more information.
Configuration changes. Organization, client, brand, policy, integration mode.
Bulk operations. Whitelist upload, scan config import, watchlist bulk action.

Audit entries are append-only. Corrections are added as new entries; nothing is deleted. Entries are written via an interface that enforces:

A real actor (or the explicit "system" actor for automated jobs).
A target scope (organization, client, optionally specific record).
A timestamp.
IP and user agent where available.
Before/after deltas for configuration changes where practical.

Retention is set per record class to meet the highest-tier client contract. The default is indefinite for sensitive workflows; archival to cold storage may apply after a period.

Imported v1 data

Migrated v1 data is labelled but not hidden:

Imported records carry source: imported and have non-null legacy_v1_* fields for traceability.
The UI displays the imported label on any surface that mixes live and imported data.
Imported audit history from v1 stays in the v1 archive; v2 audit is forward-looking from cutover.
v1 password hashes are explicitly not migrated. All users re-enroll through Authentik.
v1 TOTP secrets are explicitly not migrated. All users re-enroll MFA.
v1 sessions do not carry over. Cutover signs everyone out.

Backups

Postgres: Neon-managed backups with point-in-time recovery. Per-environment retention configured to meet the longest contractual obligation. Restore procedure documented in the production runbook.
Object storage: Versioning enabled on the evidence bucket; lifecycle rules archive old versions but never delete by default.
Authentik: Self-managed Postgres backups for the Authentik host. Recovery procedure tested periodically.
Temporal Cloud: Managed; workflow history is durable per Temporal Cloud retention.

Encryption

In transit: TLS 1.2+ for all connections (browser-to-app, app-to-DB, app-to-Authentik, app-to-provider, worker-to-Temporal). HSTS enforced on production.
At rest: Provided by managed services (Neon, S3, Authentik host disk). Per-tenant keys are not used; tenant isolation is enforced at the application layer via organization_id / client_id scoping.

Session cookies are signed (HMAC), HttpOnly, Secure (in production), SameSite Lax.
Cookies set during preview deployments use the preview hostname, not the production hostname.
The cookie name is fixed; its value is opaque (signed session token).
CSRF protection is in place for state-changing requests; same-site cookies plus origin checks.

Logging and observability

Application logs are scrubbed of secrets and PII at the source. Log statements use structured fields with subject IDs, not content.
Error tracking captures stack traces and request metadata but never request bodies for sensitive endpoints (sign-in, invite acceptance, secret writes).
Worker logs include workflow IDs and step IDs for traceability; payloads are stored in Postgres / S3 for replay, not duplicated into log streams.
Provider request/response payloads are persisted in storage with reference paths in the database, again not duplicated into log streams.

Data deletion

Soft delete by default. Removing a team member, archiving a case, or deactivating a client preserves the audit trail and the underlying data.
Hard delete is exceptional. Hard deletion of a client or organization (e.g., to comply with a data subject request) requires explicit two-person approval, is documented, and produces an audit entry describing what was removed.
Account-removal preserves audit. A user removed from an organization no longer has access, but their historical actions remain in the audit log under their stable user ID.

What customers can request

A copy of their data (cases, evidence, configuration, audit) in a portable format.
Deletion of specific records subject to legal-hold review.
Confirmation of which providers their data has been submitted to.
A list of current users with access to their data.

These requests are handled by Unphish operations through the admin surfaces, with appropriate authentication and audit.