Why is testing only whether people spot a deepfake a weak simulation design?

The largest controlled studies of phishing training find that detection-focused training barely changes click rates, and that users remain susceptible no matter how much training they receive. If human detection is the weakest layer, a simulation that measures only detection measures your weakest control and ignores the two layers that actually contain an attack: process and technology. A resilient organization is one where a fooled employee still cannot complete the damaging action because a verification procedure intervenes or a technical control blocks and alerts. That is what a people-process-technology simulation is built to measure.

What should you measure in a people-process-technology deepfake simulation?

For people: engagement rate under cumulative pressure, escalation rate, report rate, and time to report. For process: whether each control under test engaged, whether it stopped the action, and whether staff adhered to it under pressure. For technology: delivery versus filtering, detections generated per stage, whether detections were correlated, and mean time to detect. The headline metric is resilience: in how many runs did the system stop the attack despite a person being deceived. That number, not the click rate, is what you report to the board.

How to Run a Deepfake Phishing Simulation: People, Process & Technology | Breacher.ai

Q: How do you run a deepfake phishing simulation that tests people, process, and technology?

You run a single orchestrated kill chain rather than isolated lures, and you instrument all three layers before you launch. People: measure who engages, who escalates, and who reports under cumulative cross-channel pressure. Process: define the specific procedural controls under test, such as out-of-band verification for payment changes or identity proofing for MFA resets, and observe whether they engage and stop the action even when a person is fooled. Technology: confirm whether your email gateway, collaboration impersonation protection, identity provider, and SOC generated and correlated detections across the chain. Because an orchestrated campaign reaches the real decision point where process controls are supposed to fire, one engagement exercises all three layers at once.

Q: What technology controls does a deepfake phishing simulation validate?

Whether the lure was delivered or filtered by the email security gateway, whether collaboration-platform impersonation protection flagged a spoofed Teams or Zoom identity, whether the identity provider raised an alert on an anomalous MFA reset, whether endpoint controls blocked remote-access tooling, and critically whether the SOC correlated the multi-stage activity into a single incident rather than seeing disconnected events. The orchestrated chain is what makes correlation testable, because a real adversary generates related signals across systems that a mature detection program should stitch together.

Q: What is OSES (Orchestrated Social Engineering Simulations)?

OSES is a trademarked methodology developed by Breacher.ai for running conditional, multi-stage adversary emulation campaigns built on a persistent contextual layer. Because OSES campaigns reproduce the real decision point where an employee is asked to act, the procedure is asked to intervene, and the technology is asked to detect, a single engagement tests people, process, and technology together rather than testing user detection in isolation.

Categories: Deepfake,Published On: June 26th, 2026,

How to Run a Deepfake Phishing Simulation: People, Process & Technology | Breacher.ai

Methodology · People, Process & Technology

How to Run a Deepfake Phishing Simulation
That Tests People, Process & Technology.

Most phishing simulations answer one question: did the person click. That is the least useful question you can ask, because human detection is the weakest control you have. The simulation worth running tests all three layers of the triad in a single engagement — whether the person resists, whether your process intervenes when they don’t, and whether your technology detects, blocks, and correlates the attack. This is the playbook for running it, and what to measure at each layer.

Breacher.ai| Methodology| 11 min read| 2026

Why “Did They Click” Is the Wrong Test

The default phishing simulation measures one thing: the percentage of employees who fell for a lure. It is easy to run, easy to chart, and almost useless as a measure of how an organization actually withstands an attack. The reason is uncomfortable but well established. The largest controlled studies of phishing training — randomized, with control groups, across tens of thousands of employees — find that detection-focused training barely changes click rates, and that people stay susceptible no matter how much training they receive.

If human detection is the weakest layer, then a simulation that measures only detection is measuring your weakest control and ignoring the two layers that actually contain an attack. A mature organization is not one where nobody is ever fooled. That organization does not exist. A mature organization is one where a fooled employee still cannot complete the damaging action — because a verification procedure intervenes, or a technical control blocks and alerts, or the SOC correlates the activity and responds before the objective is met.

Stop asking whether your people spotted the fake. Start asking whether the system held when one of them didn’t. The first question has a depressing, well-documented answer. The second is the one that determines whether you get breached.

That reframing is the entire point of a people-process-technology simulation. You are not grading employees. You are stress-testing the full defensive system against a realistic, orchestrated attack and finding the layer that fails — because in a real intrusion, the attacker only needs one layer to fail and you only need one layer to hold.

92%of organizations vulnerable to at least one deepfake social engineering vector

78%highly vulnerable when pressure is applied across vectors, not in isolation

63%of users cannot distinguish AI-generated voice from a real person

The Three Layers, and What Each One Actually Tests

People, process, and technology are not three separate exercises. In a well-designed engagement they are three things measured simultaneously as a single attack moves through them. Here is what each layer means in a deepfake social engineering context, what failure looks like, and what you are actually measuring.

People — The Decision Under Pressure

Whether the human resists, escalates, or complies when a believable, corroborated persona asks them to act. Not whether they can pass a quiz on phishing red flags — whether, in the moment, under manufactured urgency and cross-channel pressure, they do the risky thing. The realistic version of this test is not a single email. It is a coherent situation that builds.

► Failure looks like: the employee acts on the request. Success is not “never fooled.” Success is “escalated or reported instead of acting” — a behavior, not a detection skill.

Process — The Control That Should Catch What People Miss

Whether your procedures engage and hold when a person is deceived. Out-of-band verification on payment-detail changes. Dual authorization on wire transfers. Help-desk identity proofing before an MFA or password reset. Callback verification on urgent executive requests. This is the layer that turns a fooled human into a contained incident instead of a breach, and it is the layer most simulations never test because they stop at the click.

► Failure looks like: the action completes because no procedure required a second, independent check. The procedure existing on paper is not the test. Whether it engages under pressure is.

Technology — Detection, Prevention, and Correlation

Whether your controls saw the attack and acted. Did the email gateway deliver or filter the lure? Did collaboration-platform impersonation protection flag a spoofed Teams or Zoom identity? Did the identity provider alert on an anomalous MFA reset? Did endpoint controls block remote-access tooling? And the question only an orchestrated chain can ask: did the SOC correlate the multi-stage activity into one incident, or see disconnected events and miss the campaign?

► Failure looks like: the chain generated signals that were never stitched together. Correlation is testable only when the attack is genuinely correlated — which parallel, disconnected lures are not.

How One Orchestrated Campaign Exercises All Three

The reason you can test all three layers in a single engagement is that an orchestrated, contextual kill chain reaches the real decision point — the moment where a person is asked to act, a procedure is supposed to intervene, and a control is supposed to detect. Disconnected lures never reach that point, which is why they can only ever test the person. Walk a single chain and watch all three layers come under test in sequence.

Stage 1 · Inbound Pretext

An email lands from a recognizable persona, anchored to real OSINT, establishing a believable situation — an urgent vendor banking change, an executive request before travel. Technology under test: did the gateway deliver it? People under test: did the recipient treat it as routine or escalate?

Technology: email filteringPeople: first response

Stage 2 · Cross-Channel Corroboration

A Teams message from the same persona references the email; a live AI voice call escalates the same situation with insider detail. The consistency manufactures legitimacy. Technology under test: did collaboration impersonation protection flag the external identity? People under test: does corroboration across channels lower their guard?

Technology: impersonation detectionPeople: pressure response

The Pivot

Stage 3 · The Action Request — Where Process Decides Everything

The persona asks for the damaging action: change the payment destination, reset the MFA, approve the transfer, grant the access. This is the layer no detection-only test reaches. Even if the person is fully convinced, the question becomes procedural: does an out-of-band verification fire? Is a second authorizer required? Does the help desk demand identity proofing before resetting? Process under test: does the procedure engage and stop the action independent of the person’s belief? This single stage is where a fooled employee becomes either a contained incident or a breach.

Process: verification controlsProcess: dual authorizationPeople: adherence under pressure

Stage 4 · Execution & Correlation

If the action proceeds, the chain reaches its objective — the simulated reset, the staged remote-access attempt. Technology under test: did the identity provider alert on the anomalous reset? Did endpoint controls block the tooling? And did the SOC correlate Stages 1 through 4 into one incident, or log them as unrelated noise? Correlation is the difference between detecting an attack and detecting four things.

Technology: IdP & endpoint alertsTechnology: SOC correlation

A parallel multi-channel test stops at Stage 1, four times. Only an orchestrated chain reaches Stage 3, where process lives, and Stage 4, where correlation is provable. That is why orchestration is what makes a people-process-technology test possible at all.

The Playbook: How to Actually Run It

A people-process-technology simulation succeeds or fails in the preparation. The single most common mistake is launching the attack without first defining, in writing, which procedural and technical controls you are testing and what a pass looks like. If you do not instrument the defender side before you launch, you measure the person and lose the other two layers.

Scope & Rules of Engagement

Define the target population, the objectives in scope (payment redirection, credential reset, access grant), and the hard limits. Establish a trusted control contact, legal sign-off, and an abort path. Deepfake voice and video of named individuals require explicit consent and a documented authorization chain — settle this first.

Map the Controls Under Test — Before You Build the Attack

For each objective, write down the process control that should stop it (e.g. “banking changes require callback to a number on file”) and the technology control that should detect it (e.g. “IdP alerts on MFA reset from new device”). This list is your scorecard. An attack designed without it can only grade people.

Build the Contextual Layer

Develop one shared pretext grounded in real OSINT — accurate org chart, live projects, genuine vendor relationships — and a persona held consistent across email, chat, and voice. The believability of Stage 3 depends entirely on the coherence built in Stages 1 and 2.

Instrument the Defender Side

Coordinate so you can observe what the technology saw: gateway verdicts, impersonation flags, IdP and endpoint alerts, and SOC ticketing. Decide in advance whether the SOC is blind to the test (to measure real detection and correlation) or informed (to measure response). Both are valid; choosing accidentally is not.

Execute the Orchestrated Chain

Run the campaign as a conditional sequence that adapts to target behavior, applying cumulative pressure and reaching the action request. Capture timing, branch paths, and the exact moment any layer engages — a report, an escalation, a verification callback, a control alert.

Debrief Across All Three Layers

Show the full chain back to the organization. For people, same-day micro-training mapped to the decision point. For process, a finding on each control that failed to engage. For technology, the detections that fired, those that didn’t, and whether the SOC correlated the chain. The deliverable is a system finding, not a leaderboard of who clicked.

What to Measure at Each Layer

Three layers, three sets of signals, one headline number. The metrics below are the ones that tell you where the system actually breaks — and the resilience metric at the bottom is the one your board should see.

Layer	What You’re Testing	Key Metrics	A Pass Looks Like
People	Decision under cumulative pressure	Engagement rate, escalation rate, report rate, time-to-report	Escalated or reported instead of acting
Process	Whether procedures engage and hold	Control-engagement rate, action-stopped rate, adherence under pushback	Action stopped even when the person was fooled
Technology	Detect, prevent, correlate	Delivery vs filtering, detections per stage, correlation rate, MTTD	Chain correlated into one incident and alerted
System (headline)	End-to-end resilience	Runs where the attack was stopped despite a deceived person	The objective was never reached

Note what the headline metric is not: it is not click rate. It is the share of runs where the system stopped the attack despite a person being fooled. That single number reframes the entire conversation with leadership — from “how gullible are our employees” to “how many independent layers does an attacker have to defeat,” which is the only question that maps to real-world risk.

Two Ways to Run This — One Tells You the Truth

The same deepfake attack can be delivered two ways. One produces a number for a slide. The other produces a map of where your defenses actually fail.

Detection-Only · Tests People Alone

Fire a lure, count clicks, assign training. The process layer is never reached because the test stops before any action is requested. The technology layer is never validated because there is no multi-stage activity to correlate. You learn how many people clicked and nothing about whether your organization would have survived — and because detection training barely moves that number, you will be running the same test, with the same result, next quarter.

Orchestrated · Tests the Whole System

Run the chain to the action request and beyond. When a person is fooled, you find out whether the verification procedure fired, whether dual authorization held, whether the IdP alerted, and whether the SOC connected the dots. You leave with a ranked list of which layer failed and where to invest — process gap, control gap, or correlation gap — which is the actual output of a security test.

The difference is not rigor of delivery. It is whether the attack reaches the layers where resilience is built. Detection-only testing optimizes the one control the research says you cannot meaningfully improve. Full-system testing finds the controls you can.

What a Complete Engagement Delivers

A simulation that tests all three layers should leave you with a clear, prioritized picture of where an attacker would actually get through — and what to fix first.

The Output Bar

A documented control map written before launch, so every result ties to a defined pass condition
People metrics framed as behavior — escalation and reporting — not as a click-rate scoreboard
A finding on each process control: did it engage, did it stop the action, did staff adhere under pressure
Technology validation per stage, including whether the SOC correlated the chain into one incident
The headline resilience number: how often the system held despite a person being deceived
A ranked remediation list separating process gaps, control gaps, and correlation gaps
Same-day debriefs and micro-training mapped to the exact decision point that failed
A retest plan, because a control fix you never re-test is a control fix you cannot claim

Anything short of that is a phishing click-rate report wearing a deepfake costume. Anything at or above it is an operational assessment of how your people, process, and technology behave when a single coherent attack is moving through all three at once.

The Standard Worth Holding

The deepfake era does not change the fundamentals of security testing. It raises the stakes on getting them right. A convincing synthetic voice or video makes the person more likely to be fooled, which means the layers behind the person — the process and the technology — matter more than they ever have. Testing only the layer that is getting weaker, and ignoring the layers that have to get stronger, is exactly backwards.

Run the attack that reaches the decision point. Watch whether the person resists, whether the process intervenes, and whether the technology detects. The breach happens when all three fail. Your job is to find out, in a simulation, which one fails first.

That is the whole discipline: one orchestrated, contextual attack, three layers instrumented, one resilience number that tells leadership the truth. Not whether your people are gullible — they are, everyone’s are — but whether your organization is built to survive them being gullible at the worst possible moment.

Deepfake Phishing Simulation People Process Technology Process Resilience OSES™ Orchestrated Social Engineering Out-of-Band Verification SOC Correlation Red Team Methodology Human Risk Management Deepfake Red Team

Frequently Asked Questions

Direct answers to the questions security leaders ask when designing a deepfake simulation that tests people, process, and technology together.

How do you run a deepfake phishing simulation that tests people, process, and technology?

Run a single orchestrated kill chain rather than isolated lures, and instrument all three layers before launch. People: measure who engages, escalates, and reports under cumulative cross-channel pressure. Process: define the procedural controls under test — out-of-band verification, dual authorization, help-desk identity proofing — and observe whether they engage and stop the action even when a person is fooled. Technology: confirm whether the email gateway, impersonation protection, identity provider, and SOC generated and correlated detections across the chain. Because an orchestrated campaign reaches the real decision point, one engagement exercises all three.

Why is testing only whether people spot a deepfake a weak design?

The largest controlled studies of phishing training find detection-focused training barely changes click rates, and that users stay susceptible no matter how much training they get. If human detection is the weakest layer, a test that measures only detection measures your weakest control and ignores the two layers that contain an attack. A resilient organization is one where a fooled employee still cannot complete the damaging action because a procedure intervenes or a control blocks and alerts. That is what a people-process-technology simulation measures.

What process controls should a deepfake simulation test?

The procedures an attacker must defeat to succeed: out-of-band verification for banking or payment-detail changes, dual authorization on wire transfers, help-desk identity proofing before an MFA or password reset, callback verification on urgent executive requests, and the escalation path that routes a suspicious request to security. The test is not whether these exist on paper — it is whether they engage under pressure, whether staff follow them when a believable persona pushes back, and whether they stop the action independent of whether the person was deceived.

What technology controls does a deepfake phishing simulation validate?

Whether the lure was delivered or filtered by the email gateway, whether collaboration-platform impersonation protection flagged a spoofed Teams or Zoom identity, whether the identity provider alerted on an anomalous MFA reset, whether endpoint controls blocked remote-access tooling, and critically whether the SOC correlated the multi-stage activity into one incident rather than disconnected events. The orchestrated chain is what makes correlation testable, because a real adversary generates related signals across systems that a mature program should stitch together.

What should you measure in a people-process-technology simulation?

For people: engagement rate under cumulative pressure, escalation rate, report rate, time-to-report. For process: whether each control engaged, whether it stopped the action, and adherence under pressure. For technology: delivery versus filtering, detections per stage, correlation rate, and mean time to detect. The headline metric is resilience — the share of runs where the system stopped the attack despite a person being deceived. That number, not click rate, is what you report to the board.

What is OSES (Orchestrated Social Engineering Simulations)?

OSES™ is a trademarked methodology developed by Breacher.ai for running conditional, multi-stage adversary emulation campaigns built on a persistent contextual layer. Because OSES™ campaigns reproduce the real decision point — where an employee is asked to act, a procedure is asked to intervene, and the technology is asked to detect — a single engagement tests people, process, and technology together rather than testing user detection in isolation.

Efficacy findings referenced in this article summarize large-scale, randomized, controlled studies of phishing and security-awareness training published through 2025. Control examples are illustrative of common enterprise procedures and are not exhaustive. Methodology framing reflects Breacher.ai’s OSES™ approach. Questions are welcome at support@breacher.ai.

Author

Jason Thatcher

Founder & CEO, Breacher.ai

Jason Thatcher is the Founder and CEO of Breacher.ai and creator of OSES™ (Orchestrated Social Engineering Simulations™). He has 15+ years in cybersecurity spanning security operations, threat intelligence, and executive leadership, with prior roles at ZeroFox, Deepwatch, and GuidePoint Security. He built Breacher.ai on a simple practitioner conviction: the defense that matters is not whether users spot fakes, but whether process and technology hold when they don’t. Connect on LinkedIn.

Test All Three Layers in One Engagement

Book a 30-minute walkthrough. We’ll show you an orchestrated chain that reaches the action request, where your process is supposed to intervene and your technology is supposed to detect — and how we measure whether the system held. No marketing slides.

✓ Live orchestration demo

✓ Control-mapping worksheet

✓ Free 30-minute consultation

✓ Resilience-metric preview

Book Your Walkthrough

Latest Posts

How to Run a Deepfake Phishing Simulation: People, Process & Technology | Breacher.ai
Read More
Deepfake Phishing Simulations | OSES™ by Breacher.ai
Read More
Gartner’s 4 Critical 2026 Cybersecurity Threats, Explained
Read More

About the Author: Jason Thatcher

Jason Thatcher is the Founder of Breacher.ai and comes from a long career of working in the Cybersecurity Industry. His past accomplishments include winning Splunk Solution of the Year in 2022 for Security Operations.

How to Run a Deepfake Phishing Simulation: People, Process & Technology | Breacher.ai

How to Run a Deepfake Phishing Simulation
That Tests People, Process & Technology.

Why “Did They Click” Is the Wrong Test

The Three Layers, and What Each One Actually Tests

How One Orchestrated Campaign Exercises All Three

The Playbook: How to Actually Run It

What to Measure at Each Layer

Two Ways to Run This — One Tells You the Truth