Why 95% of AI POCs fail and how the 5% cross the agentic divide with EA 4.0

From assistants to operators via Level 2.5 hardening

Most organizations sit on flashy GenAI pilots that never touch the P&L because they stop at assistance and never harden for operations. The real divide maps to moving from Level 2 assistants to Level 3 operators, with Level 2.5 hardening as the non‑negotiable gate. Leaders who design for memory, learning, and bounded autonomy—often in back‑office workflows—are claiming the ROI the rest keep pitching.

Points clés

The MIT State of AI in Business 2025 report finds 95% of organizations get zero return from GenAI, while about 5% realize meaningful value—an emerging “GenAI Divide.”
Jesper Lowgren, Enterprise Architecture Lead at DXC Technology, ties the divide to advancing from Agentic Maturity Level 2 (Assistant), through Level 2.5 (Hardening), to Level 3 (Operator).
Level 2 tools sit beside workflows; Level 2.5 adds guardrails (role boundaries, semantic alignment, policy attachment, evidence capture); Level 3 runs bounded processes with persistent memory, event triggers, and outcome-based metrics.
Adoption is high but shallow: over 80% explored or piloted general tools (e.g., ChatGPT, Copilot) and nearly 40% report deployment, yet the impact remains marginal and limited to individual productivity.
For custom or enterprise systems, roughly 60% evaluated, 20% piloted, and only about 5% reached production—typical Level 2.5 failures at the “production gate.”
The core blocker is the learning gap: systems lack memory and contextual adaptation; Level 3 demands memory, iterative learning, and safe autonomy by design.
Budgets over-index on sales and marketing (~50%), while faster payback often appears in finance, procurement, and operations via reduced BPO and agency spend (double-digit cuts, with ~30% cited).
External partnerships reach deployment about two-thirds of the time versus about one-third for internal builds, roughly doubling the success rate.
Shadow AI is widespread: only ~40% of companies purchased official LLM subscriptions, yet employees in over 90% of companies use personal AI tools; humans remain preferred for high-stakes tasks.
The 5% winners pick narrow, high-signal workflows (AP/AR, contract classification, call summarization, code generation), engineer memory and feedback loops, integrate deeply to remove external spend, and decentralize adoption with executive accountability.

À retenir

Start where the math loves you: a back‑office workflow with clean KPIs, not another sizzle demo for the all‑hands. Define what your operator can decide, when it must escalate, and who owns the kill switch—yes, a kill switch beats a thrill ride. Build the binding layer (semantics, policies, evidence), bake in memory and feedback from day one, and if hardening sounds hard, partner with someone who’s already crossed the bridge. Prove value on a narrow slice, bank the savings, then widen the lane—because “assistant theater” won’t move your P&L, no matter how many prompts you feed it.

Sources