← The Architect

KYC & Document-Verification Framework

A transparent, standards-aligned pipeline that distinguishes real identity & company documents from forgeries. No black-boxes. Every signal is auditable.

Standards we align to

  • ICAO Doc 9303 — Machine-Readable Travel Documents (passports, IDs). We parse the MRZ and verify all four ICAO 7-3-1 check digits plus the composite checksum.
  • ISO/IEC 7501 — supplementary identification document standards.
  • eIDAS Regulation (EU 910/2014) — Qualified Electronic Identification, eIDs from member states.
  • FATF Recommendation 10 — Customer Due Diligence (CDD) requirements.
  • UK FCA SYSC 6.3 + JMLSG — UK anti-money-laundering and counter-terrorist-financing rules.
  • US FinCEN CIP — Customer Identification Program (31 CFR 1020.220).
  • EU PRADO — Public Register of Authentic identity and travel Documents Online.
  • ISO/IEC 18013-5 — mobile Driving Licence (mDL).
  • PSD2 SCA + UK Open Banking 3.1 — for bank-account verification via OAuth 2.0 + FAPI.

Signals & weights

Each uploaded document is scored against seven independent signals. The weighted sum yields a 0-100 reality_score. Gemini can orchestrate and explain the review, but deterministic checks produce the auditable decision:

SignalWeightWhat it catches
mrz_checksum0.25Invented passport / ID numbers — ICAO 9303 check-digit fails for ≈ 9/10 forged IDs.
ocr_template_match0.15Wrong layout, missing mandatory fields, wrong language for jurisdiction.
exif_metadata0.10Image touched by Photoshop / GIMP / Lightroom — non-camera editor traces.
ela_proxy0.10Error-Level-Analysis — copy-move forgery, splicing, double JPEG compression.
forgery_blacklist0.10SHA-256 match against published / known-forged document hashes.
gov_registry_link0.15Cross-check against UK Companies House, SEC EDGAR, EU PRADO, etc.
document_intelligence0.15Specimen/template text, impossible MRZ dates, weak official identifiers, and homemade fake-document patterns.

Decision: score ≥ 80 → approve, 60–79 → review (human reviewer), < 60 → reject. Any blacklist hit is an automatic reject.

Lobster Trap upload boundary

Before a document reaches verification, Lobster Trap DPI checks the file name and OCR text for malicious prompt injection, exfiltration instructions, and unsafe payload intent. Files are also size-limited, signature-checked, MIME-checked, optionally scanned by ClamAV, and stored by SHA-256.

Result states are explicit: approve, review, or reject. A specimen or fake-template document is rejected even if it has a valid-looking company number, because the document intelligence signal records the forensic reason.

Didit Free KYC confirmation provider

The production identity flow now supports Didit as an external confirmation provider orchestrated by Gemini. The platform creates a hosted Didit session, redirects the user to Didit for camera/ID capture, then reads the final decision through the Sessions API and webhooks.

ID Verification
Government ID OCR and template checks across 220+ countries and 14,000+ document types.
Passive Liveness
Selfie/video presence checks to reduce replay, mask, spoof, and deepfake attempts.
Face Match 1:1
Biometric comparison between the live capture and the ID portrait.
IP Analysis
Geolocation, VPN/proxy/Tor, device and duplicate-risk signals.

Gemini does not invent KYC outcomes and does not replace the provider decision. It starts the session, explains consent, checks status, interprets Didit evidence, and combines it with local Lobster Trap/document signals where relevant.

Setup links: Didit Console · Create Session API · Webhooks · Workflows

How commercial KYC providers do it

The Architect's pipeline mirrors the public technical disclosures of:

  • Jumio Netverify — OCR + MRZ + face-match + 3D liveness. Backed by 1B+ document templates.
  • Onfido Real Identity Platform — "Atlas" ML model trained on 100M+ documents; uses ELA, JPEG ghost detection, font-kerning analysis.
  • Veriff — passive liveness (texture, micro-movements), MRZ check, NFC chip read on eMRTD passports.
  • Sumsub — face-match + ID OCR + AML watch-list cross-check (PEP, sanctions).
  • Persona — modular signals: device fingerprint, behavioural biometrics, doc OCR, selfie liveness.
  • Trulioo GlobalGateway — wraps 400+ government data sources for KYC/KYB worldwide.
  • Shufti Pro — supports 3,000+ document types in 230+ countries.

Open-source building blocks we use (or can plug in)

  • Tesseract 5 — OCR (26 language packs installed on prod).
  • PassportEye — Python MRZ extractor with check-digit verification.
  • pypassport / pyMRTD — eMRTD NFC chip read for passive authentication.
  • FaceONNX / InsightFace — face-recognition for selfie / ID-photo match.
  • OpenCV ELA — Error-Level Analysis for copy-move forgery.
  • BRISQUE / NIQE — image-quality blind metrics, useful for liveness scoring.
  • ExifTool — metadata extraction (camera make/model, editing software, GPS, timestamps).
  • libheif / sharp / ImageMagick — format normalization, re-compression for ELA, EXIF strip.
  • pdfminer.six / pdftotext — PDF text extraction for incorporation certificates.

Government registry cross-checks

Doc typeJurisdictionAuthoritative source
IncorporationUKCompanies House — find-and-update.company-information.service.gov.uk
IncorporationUSSEC EDGAR · State Secretary of State business search
IncorporationEUBRIS (Business Registers Interconnection System)
PassportanyEU PRADO (specimen reference) · ICAO PKD (Public Key Directory for eMRTD)
Driving licenceUKDVLA — driver-vehicle-licensing share-code service
National IDEUeIDAS notified schemes per member state
Tax IDEUVIES — VAT number validation

Local liveness & face-match roadmap

  • Passive liveness — single-frame texture analysis (BRISQUE), screen-replay detection.
  • Active liveness — challenge/response: blink, smile, head-turn.
  • 3D liveness — depth selfie via TrueDepth / structured-light, defeats printed-mask spoofs.
  • Face-match — Didit handles hosted production verification today; local ArcFace/InsightFace remains a future optional second opinion.
  • NIST FRVT benchmark — accuracy targets we align to (FMR 1e-6, FNMR < 0.5%).

What we publish vs. what we keep private

When an enterprise is approved and published, the public snapshot strips: sha256, storage_uri, account_iban, redaction_uri, api_key_hash, and full ocr_text. Only aggregated KYC outcomes (decision, score, framework) are visible. Raw documents and bank tokens are stored encrypted-at-rest and never leave the verification boundary.