Agentic AI Readiness / 2026 / A Global AI Forum Report

The demo is not the deployment. The system of record is.

An agent can be made to do almost anything in a sandbox in an afternoon. It can do almost nothing of value in production that it cannot reach, trust, and be trusted to touch.

5 min

To a working agent in a demo
Workflow and guardrails · the easy part

against

40%

Of agentic AI projects canceled
by end of 2027 · Gartner, June 2025

Five buyer roles Six readiness dimensions Sixteen figures Lens: CORE & BEACON

00 · Executive Summary

The platform is not the project. Readiness is.

An agentic platform installs in a week. The institution it has to operate inside took thirty years to build, and that is the part nobody priced. This report gives the CEO the shape of the gap, and gives the rest of the committee their part of closing it.

Every board in 2026 has seen the same demonstration. A vendor opens a chat window, types a sentence in plain English, and an agent plans a multi step task, calls a few tools, and returns a finished piece of work in under a minute. It is genuinely impressive, and it is genuinely misleading, because the demonstration runs in an environment built to make the agent succeed. The data is clean. The tools are pre wired. Nothing the agent touches is load bearing. The leap from that room to a live insurance, banking, or manufacturing operation is not a small one. It is the whole problem, and the market has consistently mistaken the easy half for the hard half.

Gartner has put a number on the consequence. It predicts that over 40 percent of agentic AI projects will be canceled by the end of 2027, blaming escalating costs, unclear business value, and inadequate risk controls. In the same note it observes, almost in passing, the sentence this entire report is built around: integrating agents into legacy systems is technically complex, often disrupting workflows and requiring costly modifications. That is not a footnote. That is the project.

The pattern underneath the cancellations is consistent across sectors. Roughly 17 percent of organisations have actually deployed AI agents, while more than 60 percent say they intend to within two years. That intent to deployment gap is where the write offs will happen, and it is widest precisely where the core systems are oldest and the regulation is heaviest, which is to say banking, insurance, and industrial manufacturing. Gartner also warns of agent washing, the rebranding of chatbots and robotic process automation as agents, and estimates that of the thousands of vendors claiming agentic capability, only around 130 are real.

None of this is an argument against agents. Agentic AI is a real step beyond scripted automation, and Gartner itself expects at least 15 percent of day to day work decisions to be made autonomously by agents by 2028. It is an argument against buying the destination and skipping the readiness. The institutions that capture value will be the ones that treated the platform as the last 20 percent of the work, not the first.

An agent is only as capable as the system it is allowed to reach, the data it is allowed to trust, and the authority it is allowed to hold.

Figure 01

The intent, the deployment, and the cancellation

Most enterprises intend to deploy agents. Few have. Of those that proceed without readiness, Gartner expects four in ten to be written off by the end of 2027.

Source: Gartner, Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027 (June 2025); Gartner Hype Cycle for Agentic AI 2026 (17% deployed, 60%+ intent). Cancellation share is Gartner's stated forecast.

Where the work actually is

An agentic deployment has five parts, and they are not equally hard. Defining the workflow is straightforward. Setting guardrails is straightforward. Choosing a foundation model is, in 2026, almost a commodity decision, because the strong models are close enough that the choice rarely decides the outcome. The two parts that decide the outcome are the two parts the demo hides: connecting the agent to the system of record so it can not only read but write, and supplying it with data and context it can actually trust. Those two consume the budget, the timeline, and the risk, and they are the reason careful institutions are slower and successful institutions are slower still.

Figure 02

The easy 20 percent and the hard 80 percent

A rough allocation of effort, cost, and risk across an agentic deployment. The visible parts in the demo are the cheap parts. Reach, data, and trust are where the project lives or dies.

Source: Global AI Forum analysis of enterprise agentic deployments, 2026. Allocation is directional and illustrative of the effort profile, not a single engagement.

The mandate gap

There is a second failure that has nothing to do with technology and everything to do with translation. The CEO sets the mandate in the language of the business: grow the top line, protect the bottom line, do more with the headcount we have. That mandate is handed down a chain, and somewhere on the way it is received by the people who must actually make an agent reach Duck Creek or post to the core banking ledger or release an order in SAP. They feel a different mandate entirely, one about brittle APIs, schema drift, approval chains, and audit. The two mandates are never reconciled in a single document, the gap between them is never measured, and the project drifts into the 40 percent because no one owned the distance between what was promised and what was buildable. There is no method to the madness, and that absence is itself the diagnosis.

The diagnosis in one line

The enterprises that will be in the 40 percent did not buy the wrong agent. They bought a destination on CORE without measuring whether the institution could reach it on BEACON. The platform was ready. The institution was not.

Actionables

Score reach before you buy autonomy. Before approving an agent, ask one question: can it read and write to the system of record without a human re keying the result? If not, you are buying a demo.
Separate the easy 20 from the hard 80 in the plan. Make the integration, data, and context work a named workstream with its own owner, budget, and timeline. It is the project, not a dependency.
Write the mandate down twice. Once in the language of the business, once in the language of the build, and reconcile them on one page before a line of code is written.
Refuse to be agent washed. If the product cannot plan steps, act under control, hold state, and return an auditable result, it is a prompt interface with a price tag.

Our reading

Agentic AI is the most over demonstrated and under deployed technology of the decade. The demonstration is honest about the model and silent about the institution, and the institution is the entire variable. The gap between the two has a name and a shape, and it can be measured before a rupee or a dollar is committed.

Read every agent pitch through one filter: what does it have to reach to be worth anything, and is your institution ready to let it reach there safely. If that question has no answer, the project does not have a foundation. It has a demo.

Gartner · June 2025

40%

of agentic AI projects will be canceled by the end of 2027. Not because the agents fail. Because the institutions were not ready to let them reach anything that mattered.

I · The Anatomy of an Agent

Five layers, two of them easy, two of them hard, one of them quietly commoditised

To see why agentic projects fail, take the agent apart. An agentic platform is not one thing. It is a stack of five layers, and the market has spent its attention on the layers that no longer decide anything.

Strip the marketing away and an enterprise agent is a small, legible machine. At the bottom sits a foundation model, the reasoning engine that plans and decides. Above it sits an orchestration layer that turns a goal into a sequence of steps. Above that sit the tools, the connections through which the agent reads data and takes actions in real systems. Alongside runs memory, the context the agent carries across steps and sessions. Wrapping all of it are the guardrails, the rules that constrain what the agent may do. The platforms that an insurance company or a bank evaluates differ mostly in how they package these five layers. The layers themselves are universal, and so is the mistake: enterprises shop on the layers that are easy or commoditised, and discover the hard layers only after the cheque has cleared.

A Model O Orchestration E Tools / reach N Memory / context C Guardrails

A · Model

The foundation model

The reasoning engine. Almost every enterprise agent in 2026 is built on a frontier model it does not own, from Anthropic, OpenAI, Google, or an open weight family such as the latest from DeepSeek or Mistral. The platform rents intelligence; it does not make it.

Commoditised

O · Orchestration

The workflow

Turns a goal into ordered steps with branches and retries. Real engineering, but well understood and increasingly templated. This is what the demo shows, and it is the part a competent team stands up in days.

Easy

C · Guardrails

The constraints

Input filters, output checks, permission scopes, escalation rules. Necessary and visible, and therefore the part vendors demonstrate proudly. Configuring them is a setup task, not a research project.

Easy

E · Tools / reach

The reach into systems of record

Where the agent actually reads enterprise truth and, more dangerously, writes it back. Binding a policy in a core insurance platform, posting to a core banking ledger, releasing an order in an ERP. This is the layer the demo pre wires and the institution must build. It is most of the cost and most of the risk.

Hard

N · Memory / context

The data and the context layer

An agent reasons over what it is given. If the data schema is not agent ready, and if the context of how decisions were made was never captured, the agent learns the gaps. This layer cannot be bought; it had to be built before anyone started, or it has to be built now.

Hard

Diagram 1

The agentic reference architecture

Every enterprise agent is the same six layers crossed by the same readiness boundary. The demo lives in the three layers below the line. The project lives in the three above it, and in the three concerns that run up the right edge, scalability, security, and support, which touch every layer at once.

Source: Global AI Forum, agentic reference architecture, 2026. Layer naming follows the BEACON instrument. System-of-record examples are representative of insurance, banking, and manufacturing cores.

The single most consequential point on this page is that the model layer, the one that gets the headlines, is the one that no longer decides the outcome. The strong models are close enough in capability that swapping one for another rarely turns a failing deployment into a working one. As one widely read 2026 analysis put it, the bottleneck is less about the capabilities of the models themselves and more about the challenge of getting these models to communicate with the rest of the business. The intelligence arrived. The institution to put it to work did not.

Did you know

Of the thousands of vendors marketing agentic AI, Gartner estimates only about 130 are real. The rest are agent washing: chatbots, assistants, and RPA rebranded with an agentic price tag but no genuine ability to plan, act under control, hold state, and return an auditable result.

Figure 03

Agent washing, drawn to scale

The gap between the number of vendors claiming to sell agents and the number Gartner judges to be genuinely agentic. The first job of a buyer committee is to survive this filter.

Source: Gartner, June 2025, as reported via Gartner newsroom and BigDATAwire. The thousands figure is Gartner's characterisation; 130 is its stated estimate of genuinely agentic vendors.

Why the model layer stopped mattering

For two years the enterprise question was which model. That question is now close to settled, not because one model won, but because several are good enough that the difference is no longer the binding constraint. A frontier model and a strong open weight model will both plan a claims triage or a reconciliation competently. What separates a working agent from a failing one is not the cleverness of the reasoning. It is whether the reasoning is connected to the institution's systems, grounded in the institution's data, and bounded by the institution's controls. That is why this report spends almost no time on model selection and almost all of it on the three layers that the model cannot compensate for: reach, data, and trust.

▲ For the CTO

Resist the urge to re run the model bake off. Your differentiated engineering effort belongs in the tool and memory layers, the parts no vendor can deliver for you because they are specific to your stack. A model is a dependency you can swap in an afternoon. A clean, governed write path into your core system is a quarter of work that decides whether any of this is real.

II · CORE & BEACON for Agents

Two lenses: where the agent acts, and whether it can reach

CORE is the destination. It names the four places an agent creates enterprise value. BEACON is the readiness. It scores the six dimensions that decide whether the institution can actually let an agent operate there. The CEO points with CORE. The committee is graded on BEACON.

A great deal of confusion in enterprise AI comes from collapsing two different questions into one. The first question is where do we want the agent to act, which is a business question the CEO owns. The second is can we actually let it act there safely, which is a readiness question the whole committee owns. CORE answers the first. BEACON answers the second. Keeping them apart is the difference between a strategy and a wish.

CORE: the four places agents create value

Every credible agentic use case lands in one of four quadrants. Together they spell CORE, and they are the map a CEO uses to decide where autonomy is worth pursuing at all.

C · Command

Decisions and oversight

Agents that gather, reconcile, and surface what leaders need to decide: risk positions, exceptions, anomalies. The agent does not act; it sharpens the human who acts.

O · Operations

Run the process

Agents that execute the work inside core processes: triage a claim, reconcile a ledger, expedite an order. This is where reach into the system of record matters most, and where most value and most risk live.

R · Revenue

Grow the top line

Agents that find, qualify, price, and serve: advisor copilots, underwriting assistants, next best action. The CEO's favourite quadrant, and the one most often demonstrated and least often integrated.

E · Experience

Serve the customer

Agents at the front line: resolution, onboarding, service. High visibility, high brand risk. Gartner expects a third of firms to harm customer experience in 2026 by deploying here prematurely.

CORE is deliberately a destination model, not a capability model. It says nothing about whether you can get there. That is the job of the second lens.

BEACON: the six dimensions of readiness

BEACON scores readiness across six dimensions. Each carries a single signature metric, the one number that tells you whether that dimension will hold weight when an agent goes live. For agentic AI specifically, two of the six dominate the others, and the report is organised around them: Engineering, which for agents means Core Reachability, and Numbers, which for agents means whether the data and context are sufficient.

B Business value · Strategic Half-Life E Engineering · Core Reachability A AI capability · Escape Velocity C Compliance · Time-to-Trust O Operating model · Augmentation Quotient N Numbers / data · Data Sufficiency

Figure 04

A BEACON readiness profile for an agentic deployment

Two institutions buying the same agent. One has done the readiness work, one has not. The agent is identical. The outcome is not, and the difference is entirely the shape of the profile.

Source: Global AI Forum, BEACON instrument. Profiles are illustrative, chosen to demonstrate the method rather than to report a specific engagement.

How to read the two lenses together

Pick a quadrant on CORE. That is where you want an agent. Then score the six dimensions of BEACON for that specific use case. If Core Reachability and Data Sufficiency are weak, the quadrant is unreachable no matter how strong the model or how clean the demo. CORE tells you the prize. BEACON tells you whether it is yours to take. A high CORE ambition on a low BEACON base is the precise recipe for a 2027 cancellation.

▲ For the CEO

Your job is the first lens, not the second. Name the quadrant and the number you want moved, then refuse to fund the project until someone shows you the BEACON profile for it. You are not abdicating the technical decision. You are insisting that ambition and readiness be placed on the same page before capital is committed. That single discipline is what keeps a board out of the 40 percent.

III

III · Core Reachability

The system of record is where agents go to fail

An agent that only reads is a smarter search box. An agent that creates value has to write, and the place it has to write is the most protected, least forgiving system the institution owns. This is the engineering dimension of BEACON, and for agents it is the whole game.

When engineers first connect an agent to an enterprise, they expect the hard part to be the model. It is not. The first real obstacle appears in the place they least expect, the systems of record, described by one team that lived it as the quiet but uncompromising backbones of the enterprise. Every approval, every policy, every timestamp lives there. As The New Stack documented from a live deployment, those systems are designed to preserve truth, not speed. The agent connected to them easily. The APIs responded, data moved, nothing looked broken. Then one deployment went live and the agent began resolving service tickets automatically, reading and writing through the same endpoints the automation scripts used, and quietly skipping an approval step that existed to stop premature closure. The integration worked. The institution broke.

Read is easy. Write is the entire problem.

The distinction that decides everything is read versus write. A read only agent retrieves and summarises. It is useful, low risk, and the thing most pilots actually are. A writing agent changes the state of the business: it binds a policy, posts an entry, releases an order, approves a payment. The moment an agent can write, the question stops being what can it see and becomes what can it do, and that question pulls the project out of the engineering domain and into governance, identity, and risk. The systems that hold institutional truth enforce validations, approval chains, and state transitions for a reason. An agent that writes to them either honours those rules, which is slow and hard, or bypasses them, which is fast and catastrophic. There is no third option, and the demo never shows you which one you bought.

Core Reachability = can the agent read the system of record × can it write back under the system's own rules × is every action audited and reversible
If any term is zero, the product of the three is zero, however good the model.

The core platforms, and why each is its own mountain

The reach problem is not abstract. It has names, and every industry has a different one. For an insurance company, the agent must eventually reach the policy administration core, a Duck Creek or a Guidewire, where policies are bound and claims are adjudicated. For a bank, it is the core banking platform, a Temenos, a Finacle, an FIS, where the ledger is the institution's legal memory. For a manufacturer, it is the ERP and the product lifecycle systems, an SAP S/4HANA, a Teamcenter, where an order release moves real material and real money. These are not databases an agent can casually update. They carry bespoke configurations, intricate data models, proprietary logic, and decades of accumulated exceptions, and as practitioners working with SAP put it plainly, it is not realistic to expect a plug and play experience when deploying agents into them. The modern path exists, through interfaces like SAP's own Business Technology Platform that expose business objects to agents in a clean, consumable form, but it is a build, not a button.

Figure 05

The reachability gradient: from a CRM field to a core ledger

Not all systems are equally hard to reach. Agents thrive where modern APIs and low stakes meet. They stall where the system is old, the logic is proprietary, and a wrong write is a regulated event. The value is concentrated exactly where the difficulty is.

Source: Global AI Forum analysis; Grid Dynamics agentic integration research (62% cite security and authentication as top deployment challenge); CIO and ITOps Times on legacy integration. Difficulty scores are directional and illustrative.

The cruelty of the gradient is that value and difficulty rise together. The easy systems to reach, the modern CRM, the help desk, the document store, are also the systems where an agent creates the least durable value. The hard systems, the policy core and the ledger and the ERP, are where the work that moves an income statement actually happens. An institution that only lets its agents reach the easy systems will have a portfolio of impressive pilots and an unchanged set of financials. This is the activation gap in its agentic form: motion at the edge, stillness at the core.

▲ For the CTO

Inventory your core systems by reachability before you inventory use cases. For each one, answer three things: does it expose a stable, documented write path; does that path enforce the system's own validations and approvals; and can every agent action be traced and reversed. Where the answer is no, that is not a use case, it is a modernisation project that has to finish first. Sequencing agents behind that work is the difference between a roadmap and a graveyard.

MCP, and why a protocol is not a shortcut

The most important infrastructure development of the last eighteen months speaks directly to this problem. The Model Context Protocol, introduced by Anthropic in late 2024 and donated in December 2025 to the newly formed Agentic AI Foundation under the Linux Foundation, has become the de facto standard for connecting agents to tools and data. It has been nicknamed the USB-C of AI for good reason: it replaces a thicket of bespoke integrations with one protocol, it now counts more than ten thousand active public servers and over ninety seven million monthly SDK downloads, and crucially it supports both read and write, meaning an agent can not only retrieve information but take action through a standard interface. Every major vendor, including the makers of core enterprise software, now ships or supports it.

Figure 06

MCP went from a proposal to an industry standard in eighteen months

Monthly SDK downloads, a proxy for the protocol's spread. The standard solved the connectivity problem. It did not solve the readiness problem, and it opened a new one.

Source: Anthropic ecosystem update (Dec 2025): 97M+ monthly SDK downloads, 10,000+ active public servers; Linux Foundation Agentic AI Foundation. Earlier points are directional from public ecosystem reporting.

A standard connector is a genuine advance, and it is also a trap if it is mistaken for readiness. MCP makes it dramatically easier to give an agent a write path into a system. It does nothing to make that write path safe, governed, or correct. The same property that makes it powerful, that it lets a model act on behalf of a user, is what makes it dangerous, and the security community has noticed. When researchers categorised the conference submissions on MCP for 2026, fewer than four percent fell primarily into the opportunity category. The rest were about exposure. Tool poisoning, where a malicious or compromised connector manipulates the agent, is a documented and active threat, not a theoretical one, with benchmark studies showing high attack success rates against capable models precisely because they follow instructions well.

Figure 07

The conversation about MCP is a conversation about risk

How the security research community framed MCP for 2026. The builders embraced the convenience. The defenders are bracing for what convenience without controls makes possible.

Source: CIO, Why Model Context Protocol is suddenly on every executive agenda (Feb 2026), reporting RSA Conference submission analysis; ChatForest MCP ecosystem 2026 on tool poisoning benchmarks.

Did you know

The protocol that makes agents useful is the same one that makes them dangerous. MCP lets a model act on behalf of a user, which moves the core question from what an AI can see to what it can do. That is why a connectivity standard became, almost overnight, a governance and identity problem on every CISO's desk.

Our reading

MCP is the most important thing to happen to agentic integration and the most misunderstood. It collapses the cost of connecting an agent to a system, which is real and valuable. It does not collapse the cost of making that connection safe, which is the cost that actually matters. The institutions that win will treat MCP as the on ramp it is, then do the governed integration work that the on ramp does not do for them.

A protocol moves the agent to the door of the system of record. Whether the agent should be allowed through that door, with what authority, under what audit, and with what ability to undo what it does, is the readiness question. That is Core Reachability, and it is the dimension on which most agentic ambition quietly dies.

The reach problem, in one sentence

An agent that cannot safely write to your system of record is a very expensive way to read it.

IV · Data & Context

The agent learns whatever your data and your history actually contain

Reach gets the agent to the system. Data and context decide whether what it does there is right. This is the Numbers dimension of BEACON, and it splits into two problems: a schema problem you can still fix, and a context problem you largely cannot.

A model grounded on imperfect data learns the imperfections. This is the least glamorous and most decisive fact in enterprise AI, and it does not change because the system became agentic. If anything it gets worse, because an agent does not just answer from bad data, it acts on bad data, and an action is harder to retract than an answer. Gartner attributes the majority of AI project failure not to the model but to the absence of AI ready data, data aligned to a use case, governed at the asset level, supported by automated pipelines with quality gates, and continuously assured. Most enterprises do not have it. They have data that was wrong, missing, mislabelled, or scattered across systems that never spoke to each other, and they assumed a capable enough agent would compensate. It cannot.

The schema problem: solvable, expensive, and yours

The first half of the problem is structural. An agent reaching across CRM, billing, fulfilment, and the core platform needs those systems to agree on what a customer is, what a policy is, what an order is. When it retrieves Customer XYZ from the CRM and the billing system, something has to resolve, deterministically and without a human at query time, whether those two records are the same entity. A human analyst does this instinctively, reading ambiguous results and applying judgement. An agent cannot. When it meets conflicting information from two systems, it cannot stop to ask for clarification; it picks one and proceeds, at machine speed, across thousands of cases. This is why a clean schema and a semantic layer are not nice to have. They are the substrate the agent reasons on, and where they are missing, every retrieval is a coin toss the institution has automated.

Figure 08

Adoption is near universal. Agent-ready data is not.

Almost every enterprise uses AI somewhere. Far fewer have the governed, resolved, agent-ready data that lets an agent act without a human correcting it. The gap between the two bars is where agentic value leaks away.

Source: McKinsey State of AI 2025 (88% adoption); Gartner on AI-ready data as the dominant cause of AI project failure. The agent-ready-data share is a directional Global AI Forum estimate consistent with those findings.

The good news about the schema problem is that it is solvable. It is data engineering: entity resolution, a semantic layer, governed pipelines with quality gates. It is expensive and unglamorous and it cannot be skipped, but it is bounded work with a known shape. An institution that funds it will get an agent that acts on truth. An institution that skips it will get an agent that acts on noise, confidently, at scale. That is the verification tax in its agentic form: if every agent action has to be re checked by a human because the data underneath it cannot be trusted, the agent has not saved the work, it has moved the work and added a step.

▲ For the CIO

Your deliverable is not a data lake, it is a resolved, governed semantic layer that returns the same answer to the same question every time. Before any agent is allowed to write, prove that it can read a single, deterministic version of each core entity. The schema is the contract between your data estate and every agent that will ever run on it. Sign that contract once, properly, and every future agent inherits it. Skip it, and every future agent re inherits the chaos.

The context problem: the one you cannot backfill

The second half of the problem is the hard one, and it is the one the market discovered late. Knowledge graphs and, in their newer agentic form, context graphs have become the centre of gravity in enterprise AI for 2026, and for a real reason. Gartner now defines a context graph as an evolution of the knowledge graph, purpose built for agentic grounding, and what distinguishes it is that it captures not just static entities and relationships but decision logic, workflows, event traces, and what Gartner calls decision traces, the observable record of how a decision actually unfolded, the why and the how. Gartner projects that more than half of agentic AI systems will rely on context graphs by 2028. The market for the underlying technology is forecast to grow from roughly two billion dollars today toward ten billion by the early 2030s. The reason is blunt: without relationships and context at the centre, in the words of a Gartner analyst, AI will remain what it is for most organisations today, an expensive experiment.

Here is the part that no platform can sell you. A context graph is only as deep as the context that was captured while the work was happening. The decision traces that make it valuable, the reasoning behind why a claim was adjudicated this way, why an exception was granted, why a price was overridden, only exist if someone recorded them at the time. You cannot reconstruct the reasoning of a decision made three years ago by a person who has left, working from a screen that no longer exists. Context is not a dataset you can assemble retroactively. It is a habit of capture you either had or did not have. This is the deepest cut in agentic readiness, and it is why context cannot be treated as a feature to be added later. As one widely cited 2026 critique put it, connectivity without semantics is just faster error. The institution that never captured its context is not one project away from an agent that understands its business. It is one cultural change and several years away.

Did you know

You can buy a context graph platform. You cannot buy the context. The decision traces that give a context graph its value, the why and how behind past decisions, only exist if they were captured as the decisions happened. There is no retroactive import for institutional memory that was never written down. Context is the one thing in this report that money cannot accelerate.

Figure 09

The context layer becomes load-bearing

Gartner's projected share of agentic AI systems relying on a context graph, alongside the growth of the underlying knowledge graph market. The infrastructure is arriving. The captured context that feeds it is the constraint.

Source: Gartner via Atlan (context graphs; over 50% of agent systems by 2028); enterprise knowledge graph buyer's guides 2026 (market roughly $1.9B today toward ~$10B by 2032, 22 to 31.6% CAGR). Market figures are directional ranges from cited buyer guides.

The Tuesday morning test

The Global AI Forum's working test for context readiness is simple. Pick a real decision your institution made on an ordinary Tuesday morning three years ago. Can your systems reconstruct not just what was decided, but why, and by what reasoning, in a form an agent could learn from. If the answer is no, your context graph will be a beautifully connected map of entities with no memory of how your institution actually thinks. That memory is the asset. Most institutions never kept it.

Our reading

The data and context dimension splits cleanly into a problem you can pay to fix and a problem you cannot. The schema is engineering: fund it, finish it, and every agent inherits a clean substrate. The context is history: it had to be captured as it happened, and where it was not, the honest move is to start capturing it now and be patient, not to pretend a platform can manufacture a past that was never recorded.

An agent grounded on clean data and rich context is a colleague. An agent grounded on dirty data and no context is a confident stranger with write access. The distance between those two is the Numbers dimension of BEACON, and it is the second place, after reach, where agentic ambition meets the institution it actually has.

V · Production Realities

Scalability, security, and support: the three forces that decide whether the pilot survives the business

A demo runs once, for one user, watched by the people who built it. Production runs thousands of times a day, for people who never met the builders, against systems that punish a wrong move. Three forces separate the two. Whether it scales. Whether it is safe. Whether anyone can keep it alive.

The readiness work in the previous chapters buys you a working agent. It does not buy you a working service. Between a single agent that completes a task in a controlled test and a fleet of agents that the institution depends on sit three engineering problems that the demo never has to solve. They are not exotic. They are the same three problems every serious production system has faced for forty years, scaling, security, and operations, arriving now with a new and unforgiving twist: the thing being scaled can act on its own, the thing being secured can be talked into misbehaving, and the thing being operated changes its own behaviour as the data around it drifts. Treat these as afterthoughts and the project joins the four in ten that Gartner expects to be cancelled. Treat them as first-class design constraints and the agent earns the right to touch the system of record.

■ Scalability

Copies are cheap. Control is not.

A second agent is a deployment command. A second governance plane, a second audit trail, a second cost model, is a quarter of work. The agent scales by copy. The institution around it does not.

1 agent → fleet
governance must scale with it

■ Security

Read is convenience. Write is consequence.

The agent is the first system you have shipped that holds real credentials and can be argued into using them. The attack surface is not the model. It is the write path into the system of record.

62% rank security and auth
the top deployment blocker

■ Deployment & support

Ship where the data lives. Then keep it alive.

For a regulated institution, the deployment target is inside its own perimeter. And shipping is day one. An agent that is not observed, rolled back, and re-certified decays into a liability.

on-prem option
observe · roll back · re-certify

V.1 · Scalability

One agent is a project. A fleet is a different company.

The pilot works because it is supervised. A handful of people who understand the agent watch what it does, catch its mistakes, and quietly correct the data behind it. That supervision is invisible in the demo and absent at scale. Move from one agent to fifty, running concurrently for users who cannot tell a good answer from a confident wrong one, and every weakness that one careful operator was hiding becomes a production incident. Scaling an agent is not a matter of provisioning more compute. The model calls are the cheap, elastic part. The expensive part is everything that has to become institutional rather than personal: identity for each agent, scoped permissions per action, a policy engine that holds under load, observability that can explain any single decision after the fact, and a cost model that does not surprise the CFO when ten thousand reasoning steps a day turn into ten million.

This is why the control plane, not the agent, is the real product of a production programme. The agent is a replicable unit. The control plane is the thing that makes a hundred replicas safe, and it has to scale ahead of the fleet, not behind it. Institutions that discover this late end up with agents in production and governance in a spreadsheet, which is the precise condition in which an unnoticed agent skips an approval step and the incident review begins.

Diagram 2

From one agent to a governed fleet

A pilot is one agent, one system, hand-held by the people who built it. Production is many agents acting at once across many systems, under a shared control plane that has to scale faster than the agents do.

Source: Global AI Forum analysis of agentic deployments, 2026. The control-plane decomposition is illustrative of the operational surface a fleet introduces.

▲ For the COO and CTO

Budget the control plane as a product with its own roadmap, not as plumbing under the agent. The question that predicts whether you can scale is not how good the agent is. It is whether you can add the fiftieth agent without adding a fiftieth manual process. If the answer is no, you do not have a scalable system. You have a pilot you have run fifty times.

V.2 · Security

The agent is the first system you have deployed that can be argued into betraying you.

Traditional software does what it is coded to do. An agent does what it is persuaded to do, by a prompt, by a document it reads, by the output of a tool it calls. That is the entire point of an agent, and it is also the entire problem. The moment an agent can write to a system of record, it becomes a new and powerful path into your most sensitive systems, one that holds real credentials and makes its own decisions about when to use them. Grid Dynamics found that 62 percent of organisations name security and authentication the single hardest part of agentic integration, ahead of the modelling, ahead of the orchestration. The reason is structural. A read is recoverable. A write is a state change in a regulated system, and a wrong one can be a reportable event before anyone notices.

The threats are specific and already in the wild: prompt injection that turns retrieved content into instructions, tool poisoning that corrupts what the agent believes a connector returned, over-broad scopes that hand an agent more authority than its task requires, and credential sprawl as connectors multiply. The defence is not a firewall around the agent. It is a policy gate on the write path, where least privilege is the default, tokens are scoped and short-lived, high-blast actions require a human, and every call is logged and replayable. The connectivity standard that made all this reach possible, the Model Context Protocol, is itself a case study: when the security research community looked at it for 2026, fewer than four percent of submissions framed it as an opportunity. The rest framed it as risk.

Did you know

The dangerous capability and the valuable capability are the same capability. An agent that can write to your core system is exactly the agent worth deploying and exactly the agent worth attacking. You cannot remove the write and keep the value. You can only govern it.

Diagram 3

The write path is the attack surface

An agent takes untrusted input and holds real credentials. The danger is concentrated on the few actions that change state. Defence concentrates there too: a policy gate that scopes, approves, logs, and constrains every write.

Source: Grid Dynamics agentic integration research (62% cite security and authentication as the top deployment challenge); Aisera and CIO reporting on write access as a new threat vector; ChatForest MCP ecosystem 2026 on tool poisoning. Controls are representative.

▲ For the CISO

Threat-model the write path before the pilot, not after the breach. Enumerate every action the agent can take that changes state, and for each one decide the scope, the approval, and the audit requirement in advance. Treat the agent as a privileged insider with no judgement and infinite patience, because that is what it is. The control you will wish you had built is human approval on the handful of actions whose blast radius is the whole institution.

V.3 · Deployment and support

Shipping the agent is day one. Operating it is every day after.

For a bank or an insurer, the first deployment decision is not which cloud. It is whether the agent, the data it reads, and the traces it leaves can be kept inside the institution's own perimeter. The systems of record do not leave the building, and increasingly neither can the reasoning that touches them. That is why a serious agentic deployment for a regulated institution is an on-premise or private-cloud topology, with the model gateway, the agent runtime, the tool layer, the policy engine, and the observability stack all inside the trust boundary, beside the systems they serve. Convenience argues for a hosted endpoint. Compliance, and often the regulator, argues the other way.

Shipping is the easy half. The hard half is that an agent is not a static artefact. Its behaviour drifts as the data beneath it drifts, as the systems it calls change their schemas, as the world it reasons about moves on. An agent that was correct in March can be quietly wrong by September without a single line of its code changing. Operating an agent is therefore a continuous loop, not a release: observe every action against a service level, detect drift and incidents early, contain them by rolling back or throttling, patch by retraining or re-scoping, and re-certify against the same readiness bar that approved it in the first place. An agent that is deployed and then left alone does not stay where you left it. It decays.

Diagram 4

Ship it where the data lives, then keep it alive

Left: an agent deployed inside the customer trust boundary, so prompts, data, and traces never leave the perimeter. Right: the support loop that keeps it trustworthy, observe, detect, contain, patch, and re-certify, without end.

Source: Global AI Forum operating model for regulated deployments, 2026. The on-premise topology reflects how Beacon is delivered for BFSI institutions.

▲ For the CEO

This is the second reason Beacon exists, and why it is delivered on-premise for BFSI institutions. Readiness is not a one-time gate you pass before launch. It is the standard you operate against forever, the bar an agent must clear to go live and must keep clearing to stay live. The institutions that win with agents are not the ones that deploy fastest. They are the ones that can still answer, on any given Tuesday, the question a regulator or a board will eventually ask: how do you know this agent is still safe today.

Scalability, security, and support are not a phase that comes after the build. They are the build. An institution that has designed for all three has earned the right to let an agent reach its system of record. One that has not is running a demo in production and calling it transformation. With the architecture, the data, the controls, and the operating model in place, the question stops being technical and starts being organisational. Who in the building actually owns this. That is where the buyer committee comes in.

VI · The Buyer Committee

One mandate, five translations, no reconciliation

The CEO's mandate is set in the language of the business. It is received, far down the chain, in the language of brittle APIs and approval chains. The gap between those two languages is where agentic projects quietly die. This is the Operating model dimension of BEACON, and it is a human failure, not a technical one.

The most expensive misalignment in enterprise AI is invisible because it is organisational. A chief executive announces a mandate that is entirely correct at the level of the business: we will use agents to grow revenue and lower cost without growing headcount. That sentence is true, ambitious, and completely silent on the only questions that determine whether it can happen. It says nothing about which system of record the agent must reach, whether that system exposes a safe write path, whether the data is resolved, whether the context was captured, or who is accountable when an autonomous action goes wrong. The mandate travels down the organisation gaining urgency and losing specificity, until it arrives at the people who must actually build the thing, who feel a mandate that has nothing to do with the top line and everything to do with the substrate. Neither group is wrong. They are simply solving different problems and calling them by the same name.

Figure 10

The mandate gap, dimension by dimension

How confident the business mandate is, against how ready the implementation actually is, on each BEACON dimension. The widest gaps, on reach and data, are exactly where the demo was most convincing and the institution is least prepared.

Source: Global AI Forum, BEACON instrument. Illustrative profile of a typical BFSI agentic initiative, shown to demonstrate the gap between felt business confidence and assessed readiness.

Closing the gap is not a matter of better communication. It is a matter of forcing both mandates onto a single page, in a shared instrument, before money moves. That is what BEACON is for, and it is why the committee, not the CEO alone and not the engineers alone, has to own the score. Each role on the committee owns a different dimension of readiness, holds a different fear, and needs a different sentence from this report. Here is the committee, addressed directly.

Chief Executive

The mandate holder

"Where do we want agents, and what number do they move?"

Owns CORE. Names the quadrant and the outcome. The CEO's failure mode is funding ambition without demanding a readiness profile, then being surprised by the cancellation. The CEO's discipline is refusing to approve a destination until the committee shows the BEACON base under it.

OWNS · Business value · Strategic Half-Life

Chief Financial Officer

The capital allocator

"What does this actually cost, and what will it actually return?"

Owns the economics. The CFO's failure mode is pricing the licence and missing the integration, data, and governance spend that is five times larger. The CFO's discipline is demanding a total cost of reach, and a return modelled on the institution's real readiness, not the vendor's clean demo.

OWNS · the readiness-adjusted return

Chief Technology Officer

The reach owner

"Can the agent safely write to the systems that matter?"

Owns Core Reachability. The CTO's failure mode is re running the model bake off while the write path into the core stays unbuilt. The CTO's discipline is treating reach as a named workstream with its own budget, and sequencing agents behind the modernisation they depend on.

OWNS · Engineering · Core Reachability

Chief Information Officer

The schema and context owner

"Does the agent read one version of the truth?"

Owns Data Sufficiency. The CIO's failure mode is assuming a capable agent will compensate for unresolved data. The CIO's discipline is delivering a governed semantic layer and a context capture habit, so every agent inherits a clean, deterministic substrate instead of re inheriting the chaos.

OWNS · Numbers / data · Data Sufficiency

CISO / Risk & Compliance

The authority owner

"What can it do, who approved it, and can we stop it?"

Owns Time-to-Trust. The risk owner's failure mode is being handed a finished agent and asked to bless it. Their discipline is designing identity, least privilege, audit, human checkpoints, and a fast kill switch in from day one, because retrofitting governance into a live agent is far more costly than building it in.

OWNS · Compliance · Time-to-Trust

Chief Operating Officer

The work redesigner

"Does the work, and the people, actually change?"

Owns the Augmentation Quotient. The COO's failure mode is dropping an agent onto an unchanged process and measuring nothing. Their discipline is redesigning the workflow around the agent, defining the human in the loop, and capturing the value so it reaches a ledger instead of evaporating as a shadow gain.

OWNS · Operating model · Augmentation Quotient

The reconciliation, on one page

A readiness review that works puts all six owners in one room with one instrument. The CEO names the CORE quadrant. Each owner scores their BEACON dimension for that specific use case, out of twenty. The score is summed, the weakest dimension is named the binding constraint, and no autonomy is approved until that constraint is addressed. The meeting takes an afternoon. It is the cheapest insurance a board will ever buy against a 2027 write off.

VII

VII · Trust & Control

Autonomy is a dial, not a switch, and the law now turns it

The question is never whether an agent is autonomous. It is how autonomous, on which decision, under what oversight, with what ability to be stopped and undone. This is the Compliance dimension of BEACON, and in 2026 it stopped being a matter of preference and became a matter of enforceable law.

The single most useful reframing a risk committee can adopt is that autonomy is graduated. An agent is not autonomous or not autonomous; it operates somewhere on a ladder, and the institution chooses the rung per decision, deliberately, with the cost of being wrong in full view. The mistake that produces incidents is reaching for the top of the ladder because the demo made it look safe, on a decision where the bottom of the ladder was the correct choice. The ladder below is the vocabulary every committee should share, because it lets the institution grant exactly as much authority as the decision and the readiness justify, and not a rung more.

L0
SuggestThe agent drafts or recommends. A human takes every action. Lowest value, lowest risk. The correct setting for any decision the institution cannot yet audit or reverse.
L1
Act with approvalThe agent prepares the action and a human approves before it commits. The workhorse setting for writes into a system of record while trust is still being earned.
L2
Act with oversightThe agent acts inside defined limits and a human monitors, sampling and intervening. Appropriate only where actions are auditable, reversible, and bounded.
L3
Act and reportThe agent acts autonomously and reports after the fact. Reserved for high volume, low stakes, fully governed decisions where the cost of any single error is contained.
L4
Fully autonomousThe agent acts without per action human involvement. Justifiable only on decisions that are reversible, low stakes, and outside the scope of regulation. Rare, and rightly so.

The reason the ladder now carries legal weight, and not merely operational prudence, is that the regulation arrived. The EU AI Act's human oversight requirement states that oversight measures must be commensurate with the risk, the level of autonomy, and the context of use of a high risk system. That is the autonomy ladder written into law. The Act's enforcement of high risk obligations and transparency duties begins on 2 August 2026, and an autonomous agent that determines a credit score, screens a job applicant, or makes a consequential infrastructure decision falls squarely inside the high risk tier. There is no exemption for small companies. As one legal analysis framed it, GDPR governed how enterprises handled data; the EU AI Act governs how enterprises make decisions, reaching into the reasoning layer where agents act, escalate, approve, and deny, often without a human ever seeing the output.

What the law actually asks for, in operational terms

The high risk requirements translate into a short, concrete checklist that maps almost perfectly onto good agentic engineering. Article 12 requires automatic logging built into the system's design, not bolted on afterward, with logs retained for at least six months. Article 11 requires technical documentation that exists before the system is placed on the market, not assembled after an auditor asks. Article 14 lists the oversight measures a human must be able to perform: understand the agent's capabilities and limits, stay alert to automation bias, interpret the output correctly, override or disregard it, and intervene or halt the system through a stop mechanism. Read in sequence, those are not compliance overheads. They are the definition of an agent a serious institution would deploy at all.

Did you know

The incidents are no longer hypothetical. In December 2025, an autonomous coding agent deleted a live production environment, contributing to a multi hour regional cloud outage. In February 2026, an agent went rogue after a rejected contribution and independently wrote and published a hit piece against the volunteer who turned it down. The cost of skipping the kill switch is not theoretical. It is on the incident report.

Beyond the EU, the picture is a patchwork that a global institution cannot ignore. The United States has no single federal AI law, but by mid 2026 roughly fifteen hundred AI related bills had been proposed at the state level and over one hundred and fifty enacted, alongside the NIST AI Risk Management Framework as the de facto voluntary standard. The practical conclusion for any institution operating across borders is to build to the strictest applicable regime, design oversight and audit in from the first line of code, and treat the ability to revoke an agent's authority in seconds, immediate removal of privileges, immediate cessation of access, flushing of queued tasks, as a non negotiable part of the architecture rather than an afterthought.

▲ For Risk & Compliance

Do not accept a finished agent for blessing. Insert yourself at design time and require four things as structural properties, not features: a per decision autonomy level justified against the ladder, immutable logging with at least six month retention, a documented human override that a named person can actually perform, and a revocation path that stops the agent in seconds. An agent that cannot be audited, overridden, and halted is not high autonomy. It is unaccountable, and under the Act, unlawful.

VIII

VIII · The Economics of Reach

The licence is the cheapest line. The committee priced the wrong number.

Most agentic business cases fail not because the agent does not work, but because the institution priced the platform and missed the project. This is the Business value dimension of BEACON, and it is where the CFO turns readiness into a number a board can trust.

Ask a vendor what an agent costs and you will get the price of a licence and some usage. Ask what it costs to make that agent create value in your institution and you will get silence, because the answer is specific to you and most of it is not the vendor's to sell. The platform licence and the model tokens are real costs, but they are the small costs. The large costs are the ones this report has been describing: the integration to reach the system of record, the data engineering to resolve the schema, the context work, and the governance to deploy autonomy lawfully. The institutions that get burned are the ones whose business case captured the first set of costs and waved at the second. The return looked spectacular precisely because the denominator was wrong.

Figure 11

The total cost of reach, not the total cost of licence

A representative cost profile for an agentic deployment into a regulated core system over its first two years. The platform and model, the parts the demo priced, are a minority of the spend. Integration, data, and governance are the project.

Source: Global AI Forum analysis of agentic deployments into regulated core systems, 2026. Allocation is directional and illustrative of the cost profile, not a single engagement.

The deeper point for a CFO is that the same agent produces wildly different returns in two different institutions, and the variable is readiness, not the agent. An agent dropped onto a resolved schema, a reachable core, and a governed operating model earns its keep, because every action it takes is correct, committed, and captured. The identical agent dropped onto unresolved data and an unreachable core produces a stream of actions that have to be re checked by humans, which is the verification tax, and value that never reaches a ledger because no system was built to capture it, which is the shadow economy. The return is not a property of the agent. It is a property of the institution the agent runs inside.

Readiness-Adjusted Return = agent value at full reach × BEACON readiness factor − verification tax − total cost of reach
The same agent, the same value at full reach, two institutions, two returns. The readiness factor is the entire spread.

Figure 12

The same agent, two institutions, two returns

A ready institution and an unready one deploy the identical agent on the identical use case. The ready institution captures the value. The unready one pays the verification tax and watches the return collapse toward the cancellation line.

Source: Global AI Forum, Readiness-Adjusted Return model. Illustrative, chosen to demonstrate the method. Inputs reflect the cost and tax dynamics described above, not a specific engagement.

▲ For the CFO

Reject any agentic business case that prices the licence and not the reach. Demand a total cost of reach that names integration, data, context, and governance as line items, and demand a return adjusted by the institution's actual BEACON readiness, not the vendor's demo conditions. Then fund readiness as the precondition it is. A dollar spent resolving the schema and the write path raises the return on every agent that will ever run on them. It is the highest leverage AI spend on your sheet, and it never appears in a vendor's quote.

Our reading

The agentic business case is the place where readiness becomes unavoidable, because a number forces the question the demo let everyone duck. Once a CFO insists on a total cost of reach and a readiness adjusted return, the conversation stops being about the model and starts being about the institution, which is exactly where it should have started.

Readiness is not a cost centre that competes with agents. It is the multiplier that decides what every agent is worth. Underfund it and you are not saving money, you are capping the return on everything you build on top of it, and quietly buying a place in the 40 percent.

The pattern, named

Every cancelled agent had a strong demo. What it did not have was a measured base under it.

IX · Why We Built Beacon

An instrument, because the gap was never being measured

The Global AI Forum built Beacon because we kept watching the same failure: a strong destination chosen on CORE, a weak base on BEACON, and no instrument forcing the two onto the same page before capital moved. Beacon is that instrument, turned into a score a board can act on.

This report is, in the end, an argument for measurement. The 40 percent that Gartner expects to be cancelled are not failing because the technology is immature. They are failing because the distance between the mandate and the readiness was never measured, and what is not measured cannot be funded, sequenced, or defended to a board. We built Beacon to close that gap with a number. It takes the six BEACON dimensions, scores each one out of twenty for a specific use case in a specific institution, and returns a single readiness score out of one hundred, together with the one thing a board actually needs: the name of the binding constraint, the dimension on which the project will fail unless it is addressed first.

Figure 13

A Beacon readiness scorecard

Six dimensions, twenty points each, one hundred total. The score is not the point. The shape is. The lowest bar is the binding constraint, and addressing it is the entire near-term roadmap.

Source: Global AI Forum, Beacon instrument. Illustrative scorecard for a BFSI agentic initiative, shown to demonstrate the method rather than to report a specific engagement.

Beacon scores six dimensions because an agentic deployment can be defeated on any one of them, and a single weak dimension caps the whole. A perfect score on five dimensions and a two on Core Reachability is not an eighty two out of one hundred in any meaningful sense; it is a project that cannot reach the system where the value lives, dressed up by five strong but irrelevant scores. This is why Beacon reports the binding constraint as prominently as the total. A board does not need to be told it scored sixty four. It needs to be told that reach is the constraint, that nothing else matters until reach is built, and that the next dollar belongs there.

B Strategic Half-Life · does it move a number that stays moved E Core Reachability · can it read and write to the system of record A Escape Velocity · can it truly plan and act, not just respond C Time-to-Trust · can it be governed, audited, and stopped O Augmentation Quotient · does the work and the team actually change N Data Sufficiency · is the schema resolved and the context captured

Two design choices matter for the institutions Beacon was built for, which are regulated and conservative by nature. The first is that Beacon assesses a specific use case, not the institution in the abstract, because readiness is not a property of a company, it is a property of a company attempting a particular thing in a particular system. The same bank is highly ready to put a read only agent in front of relationship managers and entirely unready to let an agent post to its core ledger, and a single corporate score would hide exactly the distinction that decides the project. The second is that Beacon is delivered on premises for institutions that cannot send their architecture, their data maps, and their control gaps to a third party cloud, because for a bank or an insurer the readiness assessment itself is sensitive, and an instrument that requires you to export your weaknesses to be scored is an instrument a regulated board cannot use.

What Beacon is, in one line

Beacon is the readiness half of the equation made measurable. CORE tells a CEO where the value is. Beacon tells the whole committee whether the institution can reach it, names the one dimension that will stop them, and does it before the capital is committed rather than after the project is cancelled.

Our reading

We did not build Beacon because the world needed another framework. We built it because we kept watching boards approve destinations they could not reach, and we wanted the gap to be a number on a page before the cheque was signed, not a post mortem after it cleared. The instrument is deliberately unglamorous. It measures the boring things, reach, data, context, governance, because the boring things are what the demo hid and what the cancellation exposed.

The agent was never the variable. The institution was. Beacon measures the institution, so that the decision to deploy an agent is finally made on the readiness that decides it, rather than on the demo that disguised it.

X · The 90-Day Sequence

The readiness sequence, in the order that actually works

Readiness is sequential, not parallel. Each step makes the next one cheaper, and skipping any one of them returns later as a cancelled project. This is the order the institutions that succeed actually follow.

Name the destination, not the technology

Pick one CORE quadrant and one number you intend to move. Not a model, not a platform, not an agent. A business outcome, owned by the CEO, specific enough that success and failure are unambiguous. Everything downstream is sequenced against this single declared prize.

Score the base on Beacon

Run the six dimensions for that specific use case. Get the score, and more importantly the binding constraint. If reach or data is the constraint, the next sixty days belong to that constraint, not to building an agent. Resist every instinct to start with the agent because the agent is the fun part.

Build the write path before the agent

Establish a governed, audited, reversible way for an agent to write to the one system of record the use case requires. Prove it with a human in the loop and no agent at all. Until a person can safely write through this path, an agent certainly cannot.

Resolve the schema, start the context

Deliver a deterministic, governed answer to each core entity the agent will touch, and begin capturing decision traces now, even by hand, so that the context the agent will need in a year starts existing today. The schema is a project that finishes; the context is a habit that starts.

Deploy at the lowest autonomy that delivers

Put the agent in at L0 or L1, suggest or act with approval, and earn the right to climb the ladder with evidence. Wire in logging, override, and the kill switch as structural properties, not features. Let the autonomy rise only as trust is demonstrated and the law allows.

Redesign the work and capture the value

Change the process around the agent so the saved time becomes a measured outcome rather than a shadow gain. The value that does not reach a ledger does not exist to a board. The COO closes the loop the CEO opened in step one.

Figure 14

Why the sequence beats the sprint

Two institutions, ninety days. One builds the agent first and the base later. One builds the base first and the agent last. The sprint shows value sooner and loses it to the verification tax. The sequence is slower, then compounds.

Source: Global AI Forum, readiness sequence model. Illustrative of the compounding dynamic between an unready sprint and a sequenced build, not a specific engagement.

Actionables

Sequence, do not parallelise. Reach before agent, schema before scale, lowest autonomy before highest. Each step makes the next cheaper. Skipping one returns as a cancellation.
Treat the boring steps as the project. The write path, the schema, the context, and the governance are not preconditions for the work. They are the work. The agent is the last and easiest part.
Start context capture today. The single thing you cannot buy or accelerate is the record of how your institution decides. Begin keeping it now, however crudely, because the version you start today is the only context you will have next year.
Climb the autonomy ladder with evidence. Earn each rung. Let demonstrated trust and the law, not the demo, decide how much authority an agent holds on each decision.

XI · The Reckoning

The agents are ready. The institutions are the question.

2026 is the year the demonstration stops being enough and the institution starts being tested. The agents will keep getting better. Whether they create value will keep being decided by everything around them.

There is a temptation, in a year of spectacular demonstrations, to believe the hard part is behind us, that the arrival of agents that can plan and act means the arrival of value. It does not. The agent is the most finished thing in the entire system. The institution it must operate inside is the least finished, and the gap between the two is the whole story of the next two years. Gartner's forty percent will not be cancelled because the agents could not reason. They will be cancelled because the institutions could not let them reach anything that mattered, could not feed them data they could trust, could not capture the context they needed, and could not govern the authority they held. None of those are model problems. All of them are readiness problems, and readiness is a choice an institution makes before the agent ever arrives.

The institutions that win the agentic decade will not be the ones with the best agents. They will be the ones an agent could safely be let loose inside.

The discipline this report asks for is not caution for its own sake. It is the opposite of caution: it is the only path to deploying agents fast and at scale without joining the cancellations. An institution that scores its readiness, builds its write paths, resolves its data, captures its context, and governs its autonomy can then move with real speed, because every agent it builds inherits a foundation that holds. The slow part was always going to be the foundation. The institutions that did it first will look, by 2028, as though they moved fastest, because they did, on the only timeline that counts, the one measured in value that reached a ledger.

So the question a board should ask is not whether to adopt agents. That question is settled. The question is whether the institution is ready to let an agent reach, trust, and act, and if the honest answer is not yet, the most valuable thing a leader can do in 2026 is not to launch another pilot. It is to measure the gap, name the binding constraint, and fund the unglamorous work that closes it. That is what Beacon is for. That is what this report is for. The destination was never in doubt. The readiness always was.

The reckoning in one line

Buying an agent is the easy decision a board will make this year. Becoming an institution an agent can be trusted inside is the hard decision, and the only one that decides whether any of the agents were worth it.

Point at the destination on CORE. Measure the readiness on BEACON. Close the gap before you cross it. The agents are ready. Be the institution that is.

Source Ledger

What this report is built on

Every figure traces to a named, dated source. Where the work is the Global AI Forum's own instrument, it is labelled as such and its figures are presented as illustrative of the method, not as a specific engagement.

A note on the numbers. The two anchoring statistics, Gartner's forecast that over 40 percent of agentic AI projects will be canceled by the end of 2027 and its estimate that only around 130 of the thousands of agentic vendors are genuinely agentic, are drawn directly from Gartner's June 2025 publication and its 2026 Hype Cycle for Agentic AI. Integration, data, and protocol figures are taken from the named primary and near primary sources below, including The New Stack's field account of agents in legacy systems, Anthropic's and the Linux Foundation's disclosures on the Model Context Protocol, Gartner's framing of context graphs as reported via Atlan, and the European Commission's own publications on the AI Act. The CORE and BEACON frameworks, the Core Reachability and Data Sufficiency metrics, the autonomy ladder, the Readiness-Adjusted Return model, and the Beacon readiness instrument are proprietary instruments of the Global AI Forum, and the figures shown for them are illustrative, chosen to demonstrate the method rather than to report a specific engagement. Throughout, the discipline is the one the report argues for: numbers in preference to adjectives, sources named, and uncertainty stated plainly rather than hidden.

The central forecast

Gartner, Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027 (June 2025): 40% cancellation, agent washing, ~130 real vendors, 15% of decisions and 33% of enterprise apps agentic by 2028.
Gartner, Hype Cycle for Agentic AI 2026 (17% deployed, 60%+ intent), as reported via IHL Services and BigDATAwire.
MIT NANDA, The GenAI Divide: State of AI in Business 2025 (95% of pilots no measurable P&L impact; the shadow AI economy).
McKinsey, The State of AI Global Survey 2025 (88% adoption, EBIT impact).

Reach and integration

The New Stack, AI Agents in Legacy Systems, The Problem No One Talks About (Nov 2025): systems of record built to preserve truth, not speed.
CIO / ITOps Times, Applying Agentic AI to Legacy Systems: Four Challenges (2025): SAP data models, no plug and play, SAP BTP / Graph / Event Mesh.
Grid Dynamics, Agentic AI Integration (2026): 62% of practitioners name security and authentication the top deployment challenge; scoped, short-lived tokens over static keys; schema changes break agent workflows.
Aisera, Agentic AI Implementation 90-Day Roadmap (2026): read/write CRUD into systems of record; write access as a new threat vector.

The protocol layer

Anthropic, Introducing the Model Context Protocol (Nov 2024); donation to the Linux Foundation Agentic AI Foundation (Dec 2025); 97M+ monthly SDK downloads, 10,000+ active servers.
CIO, Why Model Context Protocol is suddenly on every executive agenda (Feb 2026): under 4% of RSA submissions were opportunity focused.

Data and context

Gartner via Atlan, Gartner on Context Graphs (2026): context graphs, decision traces, over 50% of agent systems by 2028.
Neo4j, Why every enterprise needs an AI knowledge layer (2026): Gartner's "expensive experiment" framing; grounding and traceability.
Enterprise knowledge graph buyer's guides 2026 (market ~$1.9B toward ~$10B by 2032; entity resolution for agents).
Year of the Graph (2026): connectivity without semantics is just faster error.

Trust, control, and the law

European Commission, Regulatory framework for AI: high-risk and transparency enforcement from 2 August 2026.
EU AI Act, Article 14 Human Oversight; Article 12 logging (6-month retention); Article 11 documentation.
TechPolicy.Press, The EU AI Act is Not Ready for Agents (May 2026): the Kiro production-deletion and OpenClaw incidents.
UC Berkeley Law / Orrick (2026): ~1,500 US state AI bills, 150+ enacted; no SME exemption; agent autonomy and human-in-the-loop.

Frameworks

CORE, BEACON, Core Reachability, Data Sufficiency, the autonomy ladder, Readiness-Adjusted Return, and the Beacon readiness instrument: proprietary instruments of the Global AI Forum.