How to Run a Deepfake Phishing Simulation: People, Process & Technology | Breacher.ai
How to Run a Deepfake Phishing Simulation
That Tests People, Process & Technology.
Most phishing simulations answer one question: did the person click. That is the least useful question you can ask, because human detection is the weakest control you have. The simulation worth running tests all three layers of the triad in a single engagement — whether the person resists, whether your process intervenes when they don’t, and whether your technology detects, blocks, and correlates the attack. This is the playbook for running it, and what to measure at each layer.
Why “Did They Click” Is the Wrong Test
The default phishing simulation measures one thing: the percentage of employees who fell for a lure. It is easy to run, easy to chart, and almost useless as a measure of how an organization actually withstands an attack. The reason is uncomfortable but well established. The largest controlled studies of phishing training — randomized, with control groups, across tens of thousands of employees — find that detection-focused training barely changes click rates, and that people stay susceptible no matter how much training they receive.
If human detection is the weakest layer, then a simulation that measures only detection is measuring your weakest control and ignoring the two layers that actually contain an attack. A mature organization is not one where nobody is ever fooled. That organization does not exist. A mature organization is one where a fooled employee still cannot complete the damaging action — because a verification procedure intervenes, or a technical control blocks and alerts, or the SOC correlates the activity and responds before the objective is met.
Stop asking whether your people spotted the fake. Start asking whether the system held when one of them didn’t. The first question has a depressing, well-documented answer. The second is the one that determines whether you get breached.
That reframing is the entire point of a people-process-technology simulation. You are not grading employees. You are stress-testing the full defensive system against a realistic, orchestrated attack and finding the layer that fails — because in a real intrusion, the attacker only needs one layer to fail and you only need one layer to hold.
The Three Layers, and What Each One Actually Tests
People, process, and technology are not three separate exercises. In a well-designed engagement they are three things measured simultaneously as a single attack moves through them. Here is what each layer means in a deepfake social engineering context, what failure looks like, and what you are actually measuring.
Whether the human resists, escalates, or complies when a believable, corroborated persona asks them to act. Not whether they can pass a quiz on phishing red flags — whether, in the moment, under manufactured urgency and cross-channel pressure, they do the risky thing. The realistic version of this test is not a single email. It is a coherent situation that builds.
Whether your procedures engage and hold when a person is deceived. Out-of-band verification on payment-detail changes. Dual authorization on wire transfers. Help-desk identity proofing before an MFA or password reset. Callback verification on urgent executive requests. This is the layer that turns a fooled human into a contained incident instead of a breach, and it is the layer most simulations never test because they stop at the click.
Whether your controls saw the attack and acted. Did the email gateway deliver or filter the lure? Did collaboration-platform impersonation protection flag a spoofed Teams or Zoom identity? Did the identity provider alert on an anomalous MFA reset? Did endpoint controls block remote-access tooling? And the question only an orchestrated chain can ask: did the SOC correlate the multi-stage activity into one incident, or see disconnected events and miss the campaign?
How One Orchestrated Campaign Exercises All Three
The reason you can test all three layers in a single engagement is that an orchestrated, contextual kill chain reaches the real decision point — the moment where a person is asked to act, a procedure is supposed to intervene, and a control is supposed to detect. Disconnected lures never reach that point, which is why they can only ever test the person. Walk a single chain and watch all three layers come under test in sequence.
An email lands from a recognizable persona, anchored to real OSINT, establishing a believable situation — an urgent vendor banking change, an executive request before travel. Technology under test: did the gateway deliver it? People under test: did the recipient treat it as routine or escalate?
A Teams message from the same persona references the email; a live AI voice call escalates the same situation with insider detail. The consistency manufactures legitimacy. Technology under test: did collaboration impersonation protection flag the external identity? People under test: does corroboration across channels lower their guard?
The persona asks for the damaging action: change the payment destination, reset the MFA, approve the transfer, grant the access. This is the layer no detection-only test reaches. Even if the person is fully convinced, the question becomes procedural: does an out-of-band verification fire? Is a second authorizer required? Does the help desk demand identity proofing before resetting? Process under test: does the procedure engage and stop the action independent of the person’s belief? This single stage is where a fooled employee becomes either a contained incident or a breach.
If the action proceeds, the chain reaches its objective — the simulated reset, the staged remote-access attempt. Technology under test: did the identity provider alert on the anomalous reset? Did endpoint controls block the tooling? And did the SOC correlate Stages 1 through 4 into one incident, or log them as unrelated noise? Correlation is the difference between detecting an attack and detecting four things.
A parallel multi-channel test stops at Stage 1, four times. Only an orchestrated chain reaches Stage 3, where process lives, and Stage 4, where correlation is provable. That is why orchestration is what makes a people-process-technology test possible at all.
The Playbook: How to Actually Run It
A people-process-technology simulation succeeds or fails in the preparation. The single most common mistake is launching the attack without first defining, in writing, which procedural and technical controls you are testing and what a pass looks like. If you do not instrument the defender side before you launch, you measure the person and lose the other two layers.
Define the target population, the objectives in scope (payment redirection, credential reset, access grant), and the hard limits. Establish a trusted control contact, legal sign-off, and an abort path. Deepfake voice and video of named individuals require explicit consent and a documented authorization chain — settle this first.
For each objective, write down the process control that should stop it (e.g. “banking changes require callback to a number on file”) and the technology control that should detect it (e.g. “IdP alerts on MFA reset from new device”). This list is your scorecard. An attack designed without it can only grade people.
Develop one shared pretext grounded in real OSINT — accurate org chart, live projects, genuine vendor relationships — and a persona held consistent across email, chat, and voice. The believability of Stage 3 depends entirely on the coherence built in Stages 1 and 2.
Coordinate so you can observe what the technology saw: gateway verdicts, impersonation flags, IdP and endpoint alerts, and SOC ticketing. Decide in advance whether the SOC is blind to the test (to measure real detection and correlation) or informed (to measure response). Both are valid; choosing accidentally is not.
Run the campaign as a conditional sequence that adapts to target behavior, applying cumulative pressure and reaching the action request. Capture timing, branch paths, and the exact moment any layer engages — a report, an escalation, a verification callback, a control alert.
Show the full chain back to the organization. For people, same-day micro-training mapped to the decision point. For process, a finding on each control that failed to engage. For technology, the detections that fired, those that didn’t, and whether the SOC correlated the chain. The deliverable is a system finding, not a leaderboard of who clicked.
What to Measure at Each Layer
Three layers, three sets of signals, one headline number. The metrics below are the ones that tell you where the system actually breaks — and the resilience metric at the bottom is the one your board should see.
| Layer | What You’re Testing | Key Metrics | A Pass Looks Like |
|---|---|---|---|
| People | Decision under cumulative pressure | Engagement rate, escalation rate, report rate, time-to-report | Escalated or reported instead of acting |
| Process | Whether procedures engage and hold | Control-engagement rate, action-stopped rate, adherence under pushback | Action stopped even when the person was fooled |
| Technology | Detect, prevent, correlate | Delivery vs filtering, detections per stage, correlation rate, MTTD | Chain correlated into one incident and alerted |
| System (headline) | End-to-end resilience | Runs where the attack was stopped despite a deceived person | The objective was never reached |
Note what the headline metric is not: it is not click rate. It is the share of runs where the system stopped the attack despite a person being fooled. That single number reframes the entire conversation with leadership — from “how gullible are our employees” to “how many independent layers does an attacker have to defeat,” which is the only question that maps to real-world risk.
Two Ways to Run This — One Tells You the Truth
The same deepfake attack can be delivered two ways. One produces a number for a slide. The other produces a map of where your defenses actually fail.
Fire a lure, count clicks, assign training. The process layer is never reached because the test stops before any action is requested. The technology layer is never validated because there is no multi-stage activity to correlate. You learn how many people clicked and nothing about whether your organization would have survived — and because detection training barely moves that number, you will be running the same test, with the same result, next quarter.
Run the chain to the action request and beyond. When a person is fooled, you find out whether the verification procedure fired, whether dual authorization held, whether the IdP alerted, and whether the SOC connected the dots. You leave with a ranked list of which layer failed and where to invest — process gap, control gap, or correlation gap — which is the actual output of a security test.
The difference is not rigor of delivery. It is whether the attack reaches the layers where resilience is built. Detection-only testing optimizes the one control the research says you cannot meaningfully improve. Full-system testing finds the controls you can.
What a Complete Engagement Delivers
A simulation that tests all three layers should leave you with a clear, prioritized picture of where an attacker would actually get through — and what to fix first.
The Output Bar
- A documented control map written before launch, so every result ties to a defined pass condition
- People metrics framed as behavior — escalation and reporting — not as a click-rate scoreboard
- A finding on each process control: did it engage, did it stop the action, did staff adhere under pressure
- Technology validation per stage, including whether the SOC correlated the chain into one incident
- The headline resilience number: how often the system held despite a person being deceived
- A ranked remediation list separating process gaps, control gaps, and correlation gaps
- Same-day debriefs and micro-training mapped to the exact decision point that failed
- A retest plan, because a control fix you never re-test is a control fix you cannot claim
Anything short of that is a phishing click-rate report wearing a deepfake costume. Anything at or above it is an operational assessment of how your people, process, and technology behave when a single coherent attack is moving through all three at once.
The Standard Worth Holding
The deepfake era does not change the fundamentals of security testing. It raises the stakes on getting them right. A convincing synthetic voice or video makes the person more likely to be fooled, which means the layers behind the person — the process and the technology — matter more than they ever have. Testing only the layer that is getting weaker, and ignoring the layers that have to get stronger, is exactly backwards.
Run the attack that reaches the decision point. Watch whether the person resists, whether the process intervenes, and whether the technology detects. The breach happens when all three fail. Your job is to find out, in a simulation, which one fails first.
That is the whole discipline: one orchestrated, contextual attack, three layers instrumented, one resilience number that tells leadership the truth. Not whether your people are gullible — they are, everyone’s are — but whether your organization is built to survive them being gullible at the worst possible moment.
Frequently Asked Questions
Direct answers to the questions security leaders ask when designing a deepfake simulation that tests people, process, and technology together.
Run a single orchestrated kill chain rather than isolated lures, and instrument all three layers before launch. People: measure who engages, escalates, and reports under cumulative cross-channel pressure. Process: define the procedural controls under test — out-of-band verification, dual authorization, help-desk identity proofing — and observe whether they engage and stop the action even when a person is fooled. Technology: confirm whether the email gateway, impersonation protection, identity provider, and SOC generated and correlated detections across the chain. Because an orchestrated campaign reaches the real decision point, one engagement exercises all three.
The largest controlled studies of phishing training find detection-focused training barely changes click rates, and that users stay susceptible no matter how much training they get. If human detection is the weakest layer, a test that measures only detection measures your weakest control and ignores the two layers that contain an attack. A resilient organization is one where a fooled employee still cannot complete the damaging action because a procedure intervenes or a control blocks and alerts. That is what a people-process-technology simulation measures.
The procedures an attacker must defeat to succeed: out-of-band verification for banking or payment-detail changes, dual authorization on wire transfers, help-desk identity proofing before an MFA or password reset, callback verification on urgent executive requests, and the escalation path that routes a suspicious request to security. The test is not whether these exist on paper — it is whether they engage under pressure, whether staff follow them when a believable persona pushes back, and whether they stop the action independent of whether the person was deceived.
Whether the lure was delivered or filtered by the email gateway, whether collaboration-platform impersonation protection flagged a spoofed Teams or Zoom identity, whether the identity provider alerted on an anomalous MFA reset, whether endpoint controls blocked remote-access tooling, and critically whether the SOC correlated the multi-stage activity into one incident rather than disconnected events. The orchestrated chain is what makes correlation testable, because a real adversary generates related signals across systems that a mature program should stitch together.
For people: engagement rate under cumulative pressure, escalation rate, report rate, time-to-report. For process: whether each control engaged, whether it stopped the action, and adherence under pressure. For technology: delivery versus filtering, detections per stage, correlation rate, and mean time to detect. The headline metric is resilience — the share of runs where the system stopped the attack despite a person being deceived. That number, not click rate, is what you report to the board.
OSES™ is a trademarked methodology developed by Breacher.ai for running conditional, multi-stage adversary emulation campaigns built on a persistent contextual layer. Because OSES™ campaigns reproduce the real decision point — where an employee is asked to act, a procedure is asked to intervene, and the technology is asked to detect — a single engagement tests people, process, and technology together rather than testing user detection in isolation.
Efficacy findings referenced in this article summarize large-scale, randomized, controlled studies of phishing and security-awareness training published through 2025. Control examples are illustrative of common enterprise procedures and are not exhaustive. Methodology framing reflects Breacher.ai’s OSES™ approach. Questions are welcome at support@breacher.ai.
Test All Three Layers in One Engagement
Book a 30-minute walkthrough. We’ll show you an orchestrated chain that reaches the action request, where your process is supposed to intervene and your technology is supposed to detect — and how we measure whether the system held. No marketing slides.

