Unphish v2 Docs

Concepts and the data model

The core domain objects, statuses, and how they relate.

This page is the conceptual map of Unphish v2. Read it once and the rest of the documentation will make sense without lookups.

Tenancy and identity

Unphish is multi-tenant from the ground up. The tenant model is layered and explicit:

  • An Organization is the top-level tenant boundary. Unphish itself is one organization. Each partner is one. Some direct customers are one.
  • A Client is a rights owner under an organization. A partner with ten customers has ten clients in their organization. A direct customer organization has exactly one client (itself).
  • A Brand is a protected entity belonging to a client — usually a trademark, product line, or domain identity. One client can have multiple brands.
  • A User is a person, authenticated by Authentik. Users belong to organizations and (optionally) specific clients via Memberships with a canonical role.
  • SSO Domains map a verified email domain to an organization, allowing organization SSO login.
  • API Keys belong to a client or organization, are scoped, revocable, and tracked for last-use.

Every tenant-scoped record carries organization_id. Every client-owned record additionally carries client_id. Cross-tenant access is impossible by construction; staff/global access is always explicit and audited.

Cases and the threat lifecycle

A Case is the central record for one threat against one brand. It links the infringing URL/email/social handle/app, the client and brand it targets, the issue type and platform, evidence, and the workflow that drives it.

Cases have two orthogonal status fields:

Case Status — the high-level state:

StatusMeaning
openActive and being worked.
pendingWaiting for an external party (analyst, client, provider).
on_holdPaused for missing data, policy review, or investigation.
enforcingEnforcement is being prepared or has been submitted.
verifyingRemediation is being checked.
closedResolved; no further action required.
dismissedIntentionally excluded as not actionable.
reopenedPreviously closed, now resurrected or reopened.

Case Activity — the fine-grained lifecycle stage:

agent_review, client_review, requires_information, classifying, enriching, watchlisted, enforcement_ready, enforcement_submitted, case_successfully_suspended, content_removed, platform_denied_request, platform_unresponsive, insufficient_information, dismissed, other.

Activity changes are audit events. Status changes are audit events. The combination is the truth about where a case is.

Evidence

Each case accumulates an EvidencePackage — a canonical bundle used for review and enforcement. Components include:

  • Screenshot — desktop and mobile, full-page, with device, browser, viewport, IP geography, and timestamp.
  • DnsRecordSet — A, AAAA, MX, NS, TXT, CNAME records, plus ASN, host, ISP, country.
  • WhoisRecord / RdapRecord — registrar, registrant-derived data where legal, creation/update/expiry dates.
  • SslCertificate — issuer, subject, SANs, validity, CT log references, free-cert indicators.
  • RedirectTrace — ordered HTTP redirect chain with status codes and headers.
  • HtmlAnalysis — title, meta tags, forms, scripts, text, language, translation, and intent summary.
  • EmailEvidence — for email-based threats: headers, MX records, attachments, parsed indicators.
  • Note and Attachment — analyst, client, and system commentary plus user-uploaded files.

Classification

Every case is scored by a ClassificationRun. The scoring is deliberately multi-dimensional:

  • Visual similarity — how closely does this look like the legitimate brand?
  • NLP — what does the page text say? Credential harvest? Crypto scam? Fakeshop?
  • Domain analysis — registrar, age, infrastructure, similarity to legitimate domains.
  • Evidence quality — how complete is the evidence package?

The output is a structured record containing a confidence score, a label, a routing decision, and a human-readable explanation. Below-threshold cases pause for analyst review. Cases requiring client approval pause for client decision. Every override becomes feedback to the model evaluation pipeline.

Supported labels: fakeshop, credential_harvest, crypto_scam, impersonation, malware, phishing, trademark, copyright, harmful_content, fraudulent_mobile_app, fake_payments, financial_services_abuse, fraud_campaign, data_leakage, dark_web_risks, email_breaches, domain_squatting, other.

Enforcement

An Enforcement is a takedown record linked to one or more cases. It is dispatched on an EnforcementChannel — XARF email, CleanDNS, a specific registrar, a hosting provider, Meta, X, Cloudflare, Google Safe Browsing, Microsoft SmartScreen, or a manual browser-extension flow when no API exists.

Each channel has an EnforcementTemplate that produces the channel-specific payload (form, email body, API call, XARF schema). Enforcements track:

  • The submitted payload and provider reference.
  • Status: draft, form_filled, queued, submitted, received_request, actioned_request, partial_action, rejected_request, escalated, cancelled.
  • Provider responses, which feed back into status, SLA tracking, and verification.
  • Related artifacts — trademarks, copyright URLs, content URLs, attachments — that were copied or linked into the submission.

Verification and resurrection

After submission, VerificationCheck records run DNS, HTTP, visual, provider, and blocklist checks. The default cadence is every four hours while a case is actively being verified. Outcomes are active, checking, down, partially_down, resurrected, inconclusive, or failed.

When a case is closed, a 30-day ResurrectionMonitor window starts by default. If the threat reappears, the case reopens or spawns a linked follow-up depending on policy.

Watchlist and whitelist

Not every suspicious domain becomes a case. Dormant or no-content domains can be added to a WatchlistItem for continuous monitoring instead of immediate enforcement. Watchlists track DNS, subdomains, HTTP status codes, WHOIS, availability, and screenshots. A WatchlistUpdate is created when something changes, and WatchlistSubscriber users get alerts.

Conversely, WhitelistItem records suppress known-safe domains, URLs, email addresses, or partner-owned assets so they never become cases.

Detection configuration

Threats arrive from several sources, all expressed as a DetectionSource — for example URLScan, WhoisXML, NothingPhishy, gse.live, WhoisFreaks, a client API, or manual entry. Configurable scanning is built around:

  • Scan — an execution run.
  • ScanQuery — the parameters of one scan (module, country, date range, keywords, sources).
  • QueryFolder — grouping for scan queries.
  • QueryKeyword — included or excluded keywords.
  • QuerySiteSearch / QuerySiteExclusion — included or excluded sites.

Watchlist updates and detection feeds together produce ThreatSubmission records that intake validates and either creates a case for, dismisses as duplicate, routes to watchlist, or queues for review.

Reporting and intelligence

ReportSchedule and ReportRun records drive the weekly, monthly, or custom-cadence reports clients receive. Reports cover case volume, issue mix, enforcement outcomes, SLA, provider responsiveness, watchlist changes, and intelligence clusters.

For pattern detection, cases and evidence are grouped into ThreatCluster records based on shared infrastructure (IP, ASN, registrar, host, SSL issuer, CT patterns, redirect chains, templates). ThreatActor profiles aggregate clusters under known adversary groups. Indicator records (IOCs) are exportable as STIX 2.1, JSON, CSV, and PDF.

Workflow durability

Each case can drive a WorkflowRun — a durable orchestration instance executed by Temporal. WorkflowStep records track deterministic steps with payloads, outputs, durations, retries, errors, and logs. Workflows pause for missing information, analyst decision, client approval, provider waits, or scheduled verification, and resume by signal from UI, API, provider event, scheduled job, or manual override.

Workbench workflow traces use the same step schema as production, so every replay is a faithful reproduction.

Data source labelling

Every production-facing API that may show non-production data exposes source metadata:

  • live — current production database / provider result.
  • imported — migrated legacy or imported production data.
  • demo — sample data for demonstrations.
  • fixture — workbench / test fixture data.
  • unavailable — no data available or provider unreachable.

Production UI must never silently fall back to demo data. If live data cannot load, the UI shows an explicit unavailable or configuration state. A page that says "AI Engine ONLINE" when running on fixtures is a bug.

On this page