The AI Workflow Designer.

Exploring why most enterprise AI failures of 2024–2025 were architecture failures rather than model failures.

Jun 16, 2026

In early 2026, MIT’s NANDA Initiative published the most-quoted line of the year in enterprise AI. Across more than 1,800 generative AI deployments they had benchmarked in 2024 and 2025, 95% had delivered zero measurable P&L impact. Not “small but real”. Zero. The number ran through every CFO Slack channel by the end of the week. S&P Global ran its own audit shortly after and found that 42% of companies had abandoned most of their AI initiatives in 2025, up from 17% the year before. IBM’s Institute for Business Value put the share of enterprise AI initiatives delivering expected ROI at 25%. Gartner — based on a poll of 3,400+ organisations actively investing in agentic AI — now forecasts that 40%+ of agentic AI projects will be cancelled or fail to reach production by the end of 2027.

These are not the numbers of a market failing to spend. The same year produced a record $675 billion in hyperscaler AI infrastructure spend, with cumulative investment headed toward $3–4 trillion by the end of the decade. The capex is overwhelmingly there. The pilots are overwhelmingly there. The P&L impact is overwhelmingly not.

McKinsey, looking at the same gap, found something cleaner. The roughly 6% of organisations they call “AI high performers” — those attributing more than 5% of EBIT to AI — capture about three times the value of everyone else. They are not better at picking models. They are not better at writing prompts. They are not better at vendor selection. The single biggest factor separating them from the rest is that they redesigned their workflows end-to-end. Only 21% of all companies have. The other 79% are running new tools through old plumbing — and producing the headline numbers above.

Three weeks ago, in the first piece of this series, I introduced the five AI engines — generative, predictive, perceptive, agentic, optimisation — that will run the 2030 enterprise. Two weeks ago, the AI Operator: the new orchestration role that supervises the stack and splits into four archetypes (Conductor, Translator, Mechanic, Surgeon). Last week, the AI Verifier: the role that checks the work, and splits into four more archetypes (Domain Expert, Critic, Auditor, Red Team).

Each of those pieces answered the question “who does the work in the new stack?” This week’s question comes before all of them, and is the question that the McKinsey 21% number quietly puts on the table: who designs the work itself?

The answer is a structurally new role I’ll call the AI Workflow Designer. Like the Operator and the Verifier, it does not arrive as a single job. It splits into four named archetypes: Mapper, Boundary Setter, Recovery Designer, Composer. Each one handles a different part of the architecture. A serious 2027 workflow function has at least two of these archetypes. The high performers — the McKinsey 6% — have all four. Most organisations today have none of them, by name, on the org chart.

“Most companies are applying AI to individual tasks rather than redesigning entire workflows, but the real productivity unlock comes from reimagining workflows so people, agents, and robots each do what they do best.” — McKinsey Global Institute, 2026

Why the workflow became the bottleneck

Most of the AI failures of 2024 and 2025 were not model failures. The model was capable. The vendor demo worked. The proof of concept was credible. The board deck looked respectable. The deployment then collapsed somewhere between the pilot and the production load — and the post-mortem, almost without exception, did not say “the model was wrong”. It said some combination of “the data wasn’t where we thought it was”, “the escalation path didn’t exist”, “the handoff broke”, “the recovery wasn’t designed”, “the boundary between agent and human action wasn’t drawn”, “we hadn’t mapped the actual workflow before we automated it”. Architecture failures. Not model failures.

Microsoft’s AI Red Team published a taxonomy of agent failure modes in 2025 and updated it in early 2026. Almost every production agent failure they have catalogued, across their internal estate and the customers they advise, traces back to one of five repeating patterns. Four of the five are architecture issues — bad handoff design, missing fallback, unclear authority boundary, brittle composition between engines. Only one is squarely a model defect. The model is not the bottleneck. The workflow is.

Look at customer support, the largest single AI-touched workflow in the enterprise. 2026 CX research consistently finds that only 15% of consumers experience a seamless AI-to-human handoff. The other 85% report disjointed transitions where they have to repeat their issue, where the human agent lacks context, where the transfer takes minutes, where the chatbot has just tried to argue with them after they explicitly asked for a human. One in three human agents reports lacking the customer context they needed to resolve the issue after the AI handoff. None of these is a model accuracy failure. All of them are workflow design failures.

Look at software engineering. Recent industry analyses of agentic coding workflows in 2026 found that teams without structured delegation primitives — defined boundaries between what the agent decides and what the human decides — saw a 23% increase in bug density and a 12% increase in time spent on code review. The agent was capable. The workflow around it was not.

Look at credit decisions, claims triage, hiring, content moderation. The pattern repeats. The model performs at or above human baseline on the discrete task. The system around it produces outcomes that range from mildly disappointing to publicly catastrophic. The story is almost never “the model was wrong” alone. It is “the workflow let the wrong output ship”.

This is why the Workflow Designer is the highest-leverage IC role of the next five years. The model is the engine. The Operator drives it. The Verifier checks it. The Workflow Designer is the person who decides where the engine goes, where the driver sits, where the brake is, where the recovery lane is, and where the boundary between machine and human is drawn. Without that role, the rest of the stack runs on hope.

What the AI Workflow Designer actually does

Three things, none of which fit cleanly into the existing org chart.

Architectural mapping. The AI Workflow Designer maps the real workflow before any model is added to it — the documented steps, the undocumented steps, the data that quietly passes between people on Slack, the implicit escalations, the unstated authority boundaries, the failure modes the team already knows about but has never written down. The first deliverable is never a tool decision. It is a picture of the workflow as it actually runs.
Authority specification. The AI Workflow Designer draws, for each step in that workflow, the line between what the agent is permitted to decide, what the human seat is required to decide, what must be escalated, what must be flagged for the Verifier, what must be logged for the auditor, and what must never be done at all. This is the part the EU AI Act, which becomes fully enforceable on 2 August 2026, made legally load-bearing. A workflow without explicit authority boundaries is now a workflow with explicit legal exposure.
Resilience design. The AI Workflow Designer specifies what happens when the workflow breaks. Not whether it breaks — when. The timeout. The fallback. The rollback. The escalation packet. The customer-facing communication. The internal incident path. The audit-trail capture. The thresholds that automatically trigger human review even when nothing has visibly failed. Most production AI failures of 2024–2025 had no resilience design at all. The first time the workflow broke, it broke loudly, publicly, expensively, and unrecoverably.

The four Workflow Designer archetypes

The role splits cleanly into four. None of them is a hierarchy. They are flavours, each indispensable to a different stage of the design.

The Mapper. The systems thinker who sits with a domain expert for two hours and walks out with the actual workflow on paper — including the parts that aren’t in any process document. Their value is realism. They notice that the documented loan-approval flow has eleven steps and that the real one has nineteen, and that step fourteen is “Marie phones Pierre on Tuesday to clarify the income field”. They notice that the marketing approval workflow lists three reviewers and that two of them actually rubber-stamp without reading, and that the third is the one whose judgement everyone implicitly trusts. They notice that the order-fulfilment workflow officially has no manual exceptions, and that in practice three percent of orders are handled out of band on email because the system can’t represent them.

Mappers come from business analysis, operations, service design, lean manufacturing, internal consulting. Their skill is observation rather than imagination — they draw what is, not what should be. Best fit: any workflow about to receive AI for the first time, where the gap between the documented process and the real one is the precise gap in which the model will silently break. A Mapper who fails to surface the undocumented steps is the reason a pilot looks fine in demo and shatters in production. The good Mapper produces a workflow map that the domain experts read and quietly nod at: “yes, that is actually how it works”.

The Mapper’s risk is producing a map of what is, and stopping there. The mature Mapper is paired with at least one of the other three archetypes — usually the Boundary Setter — to turn the map into a design.

The Boundary Setter. The decision architect who specifies, for each step in the mapped workflow, where AI is permitted to act and where the human seat is required. Their value is rigour. They write the policy that says: the agent may approve loans up to €25,000 with predicted-default-rate under X; between €25,000 and €100,000 the agent may recommend but the human must sign; above €100,000 the workflow exits the agentic system entirely. They write the policy that says: the content-moderation agent may delete spam and remove obvious hate speech; borderline political content escalates to a human reviewer within fifteen minutes; content involving named public figures is not actioned by the agent at all.

Boundary Setters come from product management, policy, risk, ethics, regulated-industry compliance, and senior platform engineering. Their habit is to think in terms of permissions, thresholds, and exceptions rather than features. Best fit: any workflow with consequence — financial decisions, hiring, healthcare, content moderation, customer-facing communication, any workflow inside the EU AI Act Annex III categories. The Boundary Setter’s output is now legally load-bearing under the EU AI Act, the UK AI policy framework, the emerging US state-level AI rules, and the major insurers’ policy renewals.

The Boundary Setter’s risk is over-specification — a policy so dense and conservative that the workflow falls back to humans for everything and the AI investment never lands. The mature Boundary Setter ships a policy that is permissive enough to capture the value and conservative enough to survive the worst week of the year.

The Recovery Designer. The failure-mode specialist who designs what happens when the workflow breaks. Their value is graceful degradation. They specify the timeout — the agent must reach a verdict in fewer than seven seconds; otherwise the workflow falls back to a defined human queue. They specify the rollback — if any step in the chain produces an error, the system reverses the last three steps and notifies the operator. They specify the human-handover packet — what the human receives when an escalation arrives, in what format, with what context, what evidence, what suggested action. They specify the apology — what gets said to the customer, by whom, with what authority. They specify the audit trail — what is logged, where it is stored, who can access it, how long it is retained.

Recovery Designers come from site reliability engineering, incident response, customer experience leadership, safety engineering in regulated industries, and military operations planning. Their habit is to assume the workflow will fail and to design for that failure to be small, recoverable, and well-communicated. Best fit: every production agentic workflow — because by 2027, the question is not whether your workflow will fail in production but how visibly, how recoverably, and how cheaply it will fail.

The Recovery Designer’s risk is paranoia — a design so defensive that the agent cannot act without three layers of fallback, latency rises, and the workflow stops feeling like AI at all. The mature Recovery Designer designs for the failure that actually happens, not every failure that could be imagined.

The Composer. The architect who takes the Mapper’s workflow, the Boundary Setter’s authority policy, the Recovery Designer’s resilience plan, the available AI engines, the available Operator and Verifier archetypes, and assembles them into a coherent end-to-end workflow that actually ships value. Their value is integration. They decide where the generative engine ends and the predictive engine begins. They decide which Operator archetype owns which segment of the workflow. They decide which Verifier gate sits at which threshold. They decide which engines never touch each other.

Composers come from senior product leadership, distinguished engineering, chief-of-staff backgrounds, technical-strategy consulting, and increasingly from a new wave of explicitly AI-architecture programmes. Their habit is to hold the whole flow in their head at once. Best fit: every workflow that touches more than one AI engine — which by 2027 will be the majority of production AI workflows. The Composer is the role most likely to grow into the Chief AI Architect title that does not yet exist in stable form on most org charts.

The Composer’s risk is elegance over operability — a beautifully integrated architecture that the Operators cannot actually run, the Verifiers cannot actually verify, and the team cannot actually maintain. The mature Composer designs for the team that exists, not the team they wish they had.

None of these four is a hierarchy.

The high-performer pattern is to have all four, with the Mapper and Boundary Setter working in tight pair on the front end of every new workflow, the Recovery Designer engaged from day one rather than after the first incident, and the Composer holding the integrated picture and signing off the architecture before the engines arrive.

The mentoring problem this surfaces

Here is the second-order failure mode that ties the Workflow Designer back to The Apprenticeship Implosion, The Originality Tax, and The AI Verifier: we are not training AI Workflow Designers either, and the existing seam between product, operations and ethics — where this role lives — is not somewhere any single university programme, bootcamp, MBA, or corporate L&D track currently delivers people from.

Product schools train feature design. Operations training trains process improvement. Ethics training, where it exists, trains review. Software engineering programmes train shipping. None of them trains the integrated muscle the Workflow Designer needs: the ability to sit with a domain expert and reverse-engineer their real workflow, then to set decision boundaries that survive the worst Tuesday of the year, then to design the recovery the system needs when (not if) it breaks, then to compose multiple AI engines into one shipping flow. That is product + ops + ethics + integration architecture in one head — and the job description does not yet exist on the major boards in stable form.

What this means in practice is that for the next two to three years, the Workflow Designer is overwhelmingly a promotion candidate, not a hire. The strongest candidates are senior product leads who have already shipped complex multi-team flows; the senior operations managers who have already mapped end-to-end processes for transformation programmes; the chief-of-staff types who have already composed across silos; the SRE leads who already think about failure modes professionally; and the regulated-industry compliance leads who already think about authority boundaries with legal precision. The market signal of the next twelve months will be the salary band these promotions land at, not the title.

What this means

If you are early in your career: stop chasing the “AI engineer” title that increasingly means “good with prompts”. Build Workflow Designer evidence. Pick a workflow inside your organisation — a small one is fine — and map it end-to-end yourself, including the unwritten steps. Write the authority boundaries you would propose, with thresholds, escalation paths, and worst-case constraints. Design the recovery: what happens when the workflow breaks, who is told what, how the customer learns. Compose two AI engines into a single shipping flow, even if it is small. Publish what you find. In eighteen months, that portfolio will be worth more than any frontier-model fluency on its own. The market signal in 2027 will not be “I can ship with AI”. It will be “I designed the workflow that captured the value”.
If you are hiring: add at least one AI Workflow Designer seat to every team running multiple AI engines. The McKinsey 21% data is unambiguous — the companies that capture the AI productivity gain are the ones that have done end-to-end workflow redesign, and they have done it because someone, by name, owns that work. If you cannot find the candidate on the market — and you mostly cannot — promote from inside. Your best senior product leads, principal operators, chiefs of staff, SREs, and compliance leads are your strongest candidates, and they already know your domain. Hire for the archetype, not the title. The market for the title will catch up in eighteen months.
If you are leading: the AI Operator and the AI Verifier without the AI Workflow Designer are tactics without an architecture. The McKinsey 21% number is the number that will define who captures the AI productivity gain by 2028 and who is still running pilots. Three things to do this quarter. First, name the Workflow Designer seat explicitly on every team running more than one AI engine — not “the product manager handles it” but “Sofia is the Composer on the underwriting workflow; Idris is the Boundary Setter”. Second, fund the role at parity with senior product and principal engineering. The Workflow Designer is a senior IC role, not a junior coordinator. Third, mandate the architecture deliverable: no AI workflow ships to production without a workflow map, an authority policy, a recovery plan, and a composition diagram signed by the Workflow Designer of record. No exceptions. That signature is the audit trail when something goes wrong, and it is the asset that compounds into capability over time.

The uncomfortable truth

Most organisations are buying engines, hiring drivers, installing gates — and skipping the road.

The AI Workflow Designer is the role that builds the road. It sits in the seam between product, operations and ethics, three functions that historically have not talked to each other in any sustained way. It does not look like a growth story. It does not have a clean parent function. It is not what venture markets fund and it is not what bootcamps ship. It is the role that the McKinsey 21% have, by name, on their org chart, and that the other 79% have not yet realised they need.

The next eighteen months will rebalance this. A few visible enterprise AI failures will be reframed by their post-mortems as “we never designed the workflow”; a wave of EU AI Act enforcement actions will turn the Boundary Setter output from a nice-to-have into a regulatory line item; a handful of high-performer case studies — McKinsey’s preferred genre — will explicitly name the AI Workflow Designer function in the org chart that captured the EBIT. The title will then stabilise. The salary band will then climb. The market will then catch up.

The companies that hire ahead of that adjustment will quietly accumulate a two- to three-year advantage that, when the market catches up, will look like luck and will in fact be preparation.

Next week, the closing piece of this series: the New Org Chart. What the company that has staffed the Operators, the Verifiers, the Workflow Designers, and integrated the five engines actually looks like on a single sheet of paper. The shape of the 2030 enterprise.

Shaping Minds

Discussion about this post

Ready for more?