Phishing: Advanced Threat Analysis

⚔️

MODULE 01

AiTM PROXY ATTACK ARCHITECTURE

// Real-time credential interception bypassing MFA — Evilginx2 / Modlishka architecture

// AiTM (Adversary-in-the-Middle) phishing flow:
victim_browser ──→ phishing_proxy ──→ legit_site
                        ↑                  ↓
                   intercepts:        forwards:
                   • credentials      • all requests
                   • session_token    • all responses
                   • OTP codes        • page content

// What the proxy captures:
cookie = "SessionToken=eyJhbGciOiJSUzI1NiJ9..."
// Attacker replays this cookie → authenticated session
// TOTP/SMS MFA = completely bypassed

😈

Evilginx config: phishlet targets Microsoft 365. Captures session cookie post-MFA. Victim sees legitimate Microsoft login — no indication of compromise.

🛡️

FIDO2/Passkeys bind the credential cryptographically to the origin. Proxy gets an unusable assertion — token bound to phishing domain, rejected by real site.

// FIDO2 origin binding: the only reliable defence against AiTM

MODULE 01

AiTM Architecture

Evilginx2: Open-source AiTM framework using phishlets (reverse proxy configs per target). Captures session cookies post-authentication, defeating all OTP-based MFA.

Detection signals: Unusual session token geography, concurrent sessions, token replay from different IP/UA, impossible travel in AAD/Okta sign-in logs.

Conditional Access: Implement Continuous Access Evaluation (CAE) — revokes tokens in near-real-time based on risk signals even after authentication.

FIDO2 deployment: Mandatory phishing-resistant MFA (FIDO2/passkeys) for all privileged roles. Token binding makes session theft cryptographically impossible.

🧬

MODULE 02

ADVERSARIAL ML — EVADING DNS CLASSIFIERS

// Black-box adversarial attack on production DNS classifier

// Phase 1: Model probing via DNS resolver API
for domain in candidate_domains:
    result = resolver.query(domain)  // blocked or allowed?
    scores[domain] = infer_risk_score(result)

// Phase 2: Gradient estimation (finite differences)
def estimate_gradient(domain, eps=1):
    perturbed = mutate(domain, position=i)
    delta_score = query(perturbed) - query(domain)
    return delta_score / eps

// Phase 3: Iterative evasion
while risk_score > THRESHOLD:
    grad = estimate_gradient(current_domain)
    current_domain = apply_perturbation(grad)
    // Converges in ~200 queries to evading domain

// Evasion techniques against lexical classifiers:

word insertionbrand prefix/suffix entropy reductionvowel normalisation TLD substitutionaged domain reuse

🤖

Adversarial training + ensemble diversity + rate limiting on classification APIs reduces evasion success from ~70% to <15%.

MODULE 02

Adversarial Evasion

Black-box probing: Attackers query DNS filtering APIs with hundreds of domain variants, using the block/allow signal to estimate classifier gradients without model access.

Adversarial training: Include adversarially crafted domains in training data. Models trained against known attack strategies are significantly more robust.

Ensemble methods: Stack lexical, graph-relational, temporal, and infrastructure models. Adversarial examples that fool one model rarely fool all simultaneously.

API rate limiting: Limit classification API queries per source IP/ASN. Probing attacks require hundreds of queries — rate limiting raises attacker cost significantly.

🕸️

MODULE 03

PHISHING INFRASTRUCTURE GRAPH ANALYSIS

// Graph-based detection: shared infrastructure reveals campaigns

// Building the phishing infrastructure graph:
G = DiGraph()

// Nodes: domains, IPs, ASNs, certificates, registrars
G.add_node("phish-bank.ng",    type="domain",  risk=0.94)
G.add_node("185.220.101.47",  type="ip",       rep="malicious")
G.add_node("AS12345",          type="asn",      label="bulletproof")
G.add_node("cert_AA:BB:CC",    type="cert",     age_h=2)

// Edges: resolves_to, shares_ip, issued_by, registered_by
G.add_edge("phish-bank.ng", "185.220.101.47", rel="resolves_to")
G.add_edge("185.220.101.47", "AS12345",       rel="hosted_on")

// GNN propagates risk scores through graph:
// guilt-by-association → new domains scored by neighbours

📊

Graph analysis exposes entire phishing campaigns from a single known-bad domain. Attackers share IPs, ASNs, registrars — expanding one IOC reveals dozens.

// Tools: Maltego, SpiderFoot, DomainTools IRIS, Recorded Future

MODULE 03

Graph-Based Detection

Graph Neural Networks (GNNs): Model domain-IP-ASN-registrar relationships as a graph. Risk propagates from known-bad nodes to connected unknown nodes.

Campaign attribution: Phishing operators reuse infrastructure — same ASN, registrar, certificate authority, hosting provider. One IOC expands to full campaign mapping.

Passive DNS: Historical DNS resolution data (pDNS) reveals IP sharing patterns over time, exposing infrastructure reuse even after domain rotation.

Implementation: Integrate pDNS feeds (Farsight DNSDB, PassiveTotal) with graph database (Neo4j) for real-time infrastructure correlation and campaign tracking.

🧠

MODULE 04

LLM-POWERED SPEAR PHISHING GENERATION

// Automated OSINT-to-phish pipeline using LLMs

// Step 1: Target profile from OSINT
profile = {
  name: "Chidi Okonkwo",
  role: "CFO, Lagos Fintech Ltd",
  recent_post: "Excited about our Series B close!",
  connections: ["John Smith (KPMG)", "Ada Obi (CBN)"],
  email_pattern: "c.okonkwo@lagosfintech.ng"
}

// Step 2: LLM generates personalised phish
prompt = f"Write a convincing email to {profile.name}
           from {profile.connections[0]} about
           post-Series-B compliance requirements..."

// Output: Perfect grammar, correct names, real context
// No spelling errors, appropriate tone, plausible ask
// Detection rate by humans: ~8%  (Stanford 2024)

🛡️

Counter: LLM-generated email detection models (GPTZero-style), strict out-of-band verification protocols for financial requests, and FIDO2 prevent completion even on click.

MODULE 04

LLM Phishing Generation

Attack pipeline: OSINT scraping → profile enrichment → LLM prompt engineering → personalised email generation → automated domain + site deployment. Full cycle: <30 minutes.

Detection challenges: LLM-generated phishing defeats grammar/spelling heuristics entirely. Focus shifts to sender authentication (DMARC), URL analysis, and behavioural signals.

LLM detection: Perplexity-based classifiers (LLM-generated text has lower perplexity against language models) can identify AI-authored emails with ~78% accuracy.

Process control: The most robust defence is process — mandatory voice/video verification for wire transfers, regardless of email authenticity signals.

🔬

MODULE 05

DNSSEC & RPKI — INFRASTRUCTURE HARDENING

// DNSSEC chain-of-trust verification against DNS hijacking in phishing campaigns

// DNSSEC validation flow:
$ delv +rtrace +vtrace bank.com A

// Resolver validates signature chain:
. (root) KSK → DS hash → com. ZSK → DS hash →
bank.com ZSK → RRSIG validates A record

// If attacker hijacks registrar and changes NS:
bank.com NS → attacker_ns.evil.com
// Result: DNSSEC validation FAILS (DS mismatch)
// Validating resolvers return SERVFAIL → attack exposed

// RPKI prevents BGP hijacking of DNS traffic:
ROA: 203.0.113.0/24 AS12345 maxLength=24
// Invalid BGP announcement → RPKI-invalid → dropped

// DNSSEC deployment check:

dig bank.com DNSKEY +dnssec | grep "256\|257" dig bank.com DS @8.8.8.8 delv bank.com A | grep "fully validated"

MODULE 05

DNSSEC + RPKI Hardening

DNSSEC against phishing: Signed zones with DS records in parent mean NS record hijacking (registrar compromise) causes DNSSEC validation failure — attack is cryptographically detectable.

Registry Lock: Prevents NS record changes without out-of-band authorisation from domain owner. Critical for high-value domains targeted by phishing campaigns.

RPKI (Resource PKI): Cryptographically validates BGP route origins. Prevents BGP hijacking used to redirect DNS resolver traffic to attacker-controlled infrastructure.

Monitoring: Set up RIPE Stat / BGPmon alerts for your prefix. Certificate transparency monitoring (crt.sh API) for lookalike domain detection within seconds of cert issuance.

📡

MODULE 06

DETECTION ENGINEERING — SIGMA RULES

// Sigma detection rules for phishing-related DNS activity

title: DNS Query to Newly Registered Phishing Domain
status: stable
description: Detects DNS queries to domains registered
             within 24h matching brand similarity patterns
logsource:
  category: dns
detection:
  selection:
    dns.query.name|re: '(bank|paypal|microsoft|gtbank).{0,20}\.(com|ng|net)'
    dns.registration_age_hours: |lt: 24
  filter_legit:
    dns.query.name|endswith:
      - 'gtbank.com'
      - 'microsoft.com'
  condition: selection and not filter_legit
falsepositives:
  - New legitimate subdomains (verify with CTI)
level: high
tags:
  - attack.initial_access
  - attack.t1566.002  # Spearphishing Link

// Additional high-value phishing detection rules:

NXDomain storm (DGA)DNS tunnelling entropy Fast-flux TTL anomalyBEC impossible travel AiTM session replayCredential access post-click

MODULE 06

Detection Engineering

Sigma framework: Vendor-neutral detection rule format. Write once, convert to Splunk SPL, Elastic ESQL, Microsoft KQL, or any SIEM. Essential for portability.

ATT&CK mapping: Tag rules with MITRE ATT&CK technique IDs (T1566.001 Spearphishing Attachment, T1566.002 Spearphishing Link) for coverage gap analysis.

Domain age enrichment: Enrich DNS logs with WHOIS registration date in real-time. Queries to <24h old domains matching brand patterns are high-confidence phishing signals.

False positive management: Maintain allowlists of known legitimate new domains. Certificate transparency monitoring feeds provide advance notice of legitimate new subdomains.

🏗️

MODULE 07

ZERO TRUST EMAIL ARCHITECTURE

// Complete email security stack — defence in depth

LAYER	CONTROL	BYPASSED BY	HARDENS WITH
DNS	DMARC p=reject	Display name spoof	BIMI + VMC
Gateway	AI URL rewrite	AiTM proxies	Time-of-click scan
Sandbox	Detonation	Delayed payload	48h re-scan
Identity	FIDO2 MFA	SMS/TOTP	Origin binding
Session	CAE + Entra ID	Static tokens	Continuous eval
Endpoint	EDR + isolation	LOTL attacks	Behavioural AI

🏛️

Zero Trust principle applied to email: treat every inbound message as hostile until proven otherwise. Verify sender, verify link, verify identity — independently at each layer.

MODULE 07

Zero Trust Email

Time-of-click URL scanning: Links are not scanned at delivery — they are scanned when clicked. This catches URLs that were clean at delivery but weaponised later.

CAE (Continuous Access Evaluation): Azure AD/Entra ID feature that revokes access tokens in near-real-time based on risk signals — blocks session cookie replay from phishing.

BIMI (Brand Indicators for Message Identification): Displays organisation's verified logo in email clients. Requires DMARC enforcement + Verified Mark Certificate (VMC).

Delayed payload defence: Some phishing links are benign at delivery, weaponised post-scan. Implement 48-hour re-scanning of all URLs delivered to mailboxes.

🔴

MODULE 08

RED TEAM: PHISHING SIMULATION METHODOLOGY

// Authorised phishing simulation — GoPhish / custom framework

// Campaign metrics to track:
metrics = {
  "delivery_rate":     sent / delivered,     // email gateway effectiveness
  "open_rate":         opened / delivered,   // curiosity / urgency response
  "click_rate":        clicked / opened,     // URL inspection habits
  "cred_submit_rate":  submitted / clicked,  // critical failure metric
  "report_rate":       reported / delivered, // security culture metric
  "time_to_click":     mean(click_times),    // decision velocity
  "time_to_report":    mean(report_times)    // response speed
}

// Target: cred_submit < 2%, report_rate > 30%
// Segment by department, role, prior training

⚠️

Ethical bounds: never simulate credential submission to third-party phishing pages. Capture click only. Immediate education for clickers — no blame culture.

MODULE 08

Phishing Simulation

GoPhish + custom phishlets: Open-source platform for authorised phishing simulations. Supports tracking pixels, credential capture pages, and detailed campaign analytics.

Segmentation: Analyse results by department, seniority, prior training exposure. Finance and HR are consistently highest-risk — prioritise advanced training for these groups.

Just-in-time training: Immediately redirect clickers to a short training module explaining what they missed. Training at point-of-failure is 6x more effective than annual CBT.

Reporting culture: The report rate is more important than the click rate. An organisation where people report suspicious emails quickly is more resilient than one where nobody clicks but nobody reports either.

⚡

MODULE 09

SOAR AUTOMATION — PHISHING RESPONSE

// Automated phishing response playbook — Microsoft Sentinel / Splunk SOAR

// Trigger: User reports phishing email
on_event("PhishingReported"):

  T+00s: email.extract_iocs(reported_email)
         // → [domain, url, sender_ip, attachment_hash]

  T+02s: parallel([
    vt.scan(iocs),           // VirusTotal enrichment
    urlscan.submit(url),     // live screenshot + analysis
    whois.query(domain),     // registration age check
    dns.sinkhole(domain),    // block immediately
  ])

  T+05s: email.quarantine(campaign_matches)
         // find + pull all similar emails org-wide

  T+10s: if vt.malicious or age_hours < 48:
           identity.revokeSession(affected_users)
           endpoint.isolate(if_clicked)

  T+30s: ticket.create(severity, enriched_iocs)
         threat_intel.share(iocs)  // MISP / STIX/TAXII

MODULE 09

SOAR Playbook

Mean Time to Respond (MTTR): Manual phishing response averages 4+ hours. SOAR automation reduces MTTR to <60 seconds for containment actions.

STIX/TAXII sharing: Standardised threat intelligence sharing format. Automatically push phishing IOCs to sector ISACs and AfricaCERT for coordinated defence.

Org-wide quarantine: When one user reports a phishing email, SOAR searches all mailboxes for matching sender/subject/URL patterns and quarantines automatically.

Feedback loop: SOAR enrichment data feeds back into ML models — confirmed phishing IOCs become training samples, continuously improving classifier accuracy.

🎯

MODULE 10

ADVANCED THREAT SUMMARY & RESEARCH FRONTIERS

// Current research frontiers in anti-phishing

FRONTIER	CURRENT STATE	OPEN PROBLEMS
LLM detection	~78% accuracy	Watermarking, perplexity evasion
AI-DGA detection	>95% known families	Zero-day AI-DGA evasion
AiTM detection	Heuristic only	Reliable real-time token binding
Post-quantum auth	NIST standards ready	DNS perf with larger DNSSEC keys
Federated CTI	Manual STIX/TAXII	Privacy-preserving ML sharing

🔭

The phishing arms race is an AI vs AI competition. Investment in detection robustness, phishing-resistant authentication, and coordinated threat intelligence sharing defines the next frontier.

END OF ADVANCED MODULE

Phishing: Advanced Threat Analysis · By Gbemisola Esho · 2026

MODULE 10

Research Frontiers

Post-quantum DNSSEC: NIST-standardised algorithms (CRYSTALS-Dilithium, FALCON) produce larger signatures. DNS performance impact requires AI-optimised caching strategies.

Federated learning for CTI: Train shared phishing detection models across organisations without sharing raw DNS query data — privacy-preserving collective defence.

Token binding revival: RFC 8471 token binding cryptographically binds session tokens to TLS channel — makes AiTM attacks cryptographically impossible. Awaiting browser re-adoption.

Continuous research: Follow Google Project Zero, Cloudflare Research, Cisco Talos, and academic venues (IEEE S&P, USENIX Security, ACM CCS) for emerging attack/defence developments.

PHISHING: ADVANCED THREAT ANALYSIS

// MODULE INDEX