Global AI Forum Edition 01 · 2026
The Enterprise Field Guide

The Buyer's
Atlas

How a CEO, CFO, CTO, CIO, or compliance lead actually chooses a foundation model. Every major model on the board. Seven parameters. One decision you can defend in a boardroom.

A long read · ~30 min
Mapped to task, function & industry
Updated 27 June 2026
Scroll to begin
Start here

There is no best model.
There is only the right one.

The most expensive mistake in enterprise AI is not picking the wrong vendor. It is believing one model should win every job. In 2026 the frontier is a portfolio, not a podium. The leaders trade places by the week. A model that writes the cleanest board memo may be the wrong one to run a ten-hour migration. A model that aces a reasoning benchmark may quietly leak data your regulator will ask about.

This guide does one thing well. It hands the buyer a vocabulary. Seven parameters that decide everything. A full atlas of the models that matter. Five lenses, one for each seat at the table. And a way to walk into the room with a choice you can explain in plain words and still defend under audit.

The one rule to keep

Pick by elimination, not reputation. Strike every model that fails a non-negotiable first: data residency, latency ceiling, cost cap. Then match what survives to the intelligence the task actually needs. Most teams run three to five models at once and route each job to the one that leads it.

8+
Model families now compete, up from 3 in 2023
100×
Price gap between the cheapest and priciest mainstream model
~5%
Of sessions where a top model's safety layer reroutes the request
The Vocabulary

Seven parameters
that decide it all.

Forget the leaderboards for a moment. Every model choice in the enterprise comes down to seven dials. Learn these and you can read any spec sheet, cut through any sales deck, and ask the one question the vendor hoped you would not.

01
Capability tier How smart, for this kind of work
+

Capability is not one number. A model can sit at the frontier for coding and mid-pack for long multimodal reasoning. The honest way to read it is per task type: agentic coding, deep reasoning, writing and tone, multimodal, multilingual. The benchmarks that matter in 2026 are SWE-Bench Pro and Terminal-Bench for coding agents, GPQA Diamond and FrontierMath for reasoning, and human-preference arenas for writing.

In plain wordsDo not ask "is it smart." Ask "is it smart at the one thing I will make it do all day." A top-of-class coder can be an average lawyer.
CodingReasoningWritingMultimodalMultilingual
02
Cost & token economics What it really bills, not the sticker
+

Price is quoted per million input and output tokens, and the spread is enormous: from roughly ten cents per million on a value model to ten dollars on a top tier, a hundred-fold gap. But the sticker lies. Frontier models fan a single prompt into dozens of internal calls, so a "one" request can bill like fifty. Output tokens cost three to five times input. Long-context requests often jump to a higher rate above a threshold.

In plain wordsThe bill is set by how the model works, not by the headline rate. Budget it, cap it, and route only the hard, high-value work to the expensive tier. Many teams have been blindsided by exactly this.
Input/output splitFan-outLong-context surchargeCaching
03
Context window How much it can hold at once
+

The context window is how much text a model considers in a single request, measured in tokens, each roughly three-quarters of a word. In 2026 a million tokens is table stakes: most frontier models hold around 1M, some reach far higher. That lets you drop an entire codebase, a full contract set, or a research corpus into one pass. But watch the gap between the advertised window and the effective window. Providers differ sharply in how well a model actually uses the back half of a long prompt.

In plain wordsBig window equals fewer "it forgot what I told it" moments. But a model that holds a million tokens and only reads the first hundred thousand well is selling you shelf space it does not use.
1M standardEffective vs advertisedOutput cap
04
Deployment & data residency Where your data physically goes
+

The first fork in the road is proprietary versus open weight. Proprietary models reach peak capability through an API, with no infrastructure to run, but your data leaves the building and you depend on the provider's roadmap and uptime. Open-weight models can be downloaded and run on your own hardware, giving full control, privacy, and zero per-token cost, in exchange for the burden of running them. For regulated work in healthcare, government, and financial services, self-hosting is now a legitimate path, not a capability sacrifice.

In plain wordsIf a regulator can ask "where did that data go," you need an answer before you need a benchmark. Open weights on your own servers is the cleanest answer. Hosted API is the fastest start.
Open weightProprietary APIPrivate cloudAir-gapped
05
Latency & throughput How fast, and how much at once
+

Two numbers matter: time to first token, which is how responsive it feels, and output throughput, how fast it finishes. A customer-facing assistant lives or dies on the first; a nightly batch job on the second. The trick is that the smartest model is rarely the fastest. Reasoning models that "think" before answering pay for depth with delay. For high-volume, latency-sensitive work, a smaller distilled model often wins on experience even though it loses on paper.

In plain wordsFor a chat box, fast and good-enough beats slow and brilliant. For an overnight pipeline, nobody is watching the clock. Match the speed to who is waiting.
Time to first tokenThroughputReasoning delay
06
Agentic & tool use Can it act, not just answer
+

The defining shift of 2026 is from assistant to agent. The question is no longer "can it write the answer" but "can it run the whole job": call tools, hit your systems, chain hundreds of steps, and keep working through ambiguity without a human touching each one. The leaders now sustain multi-hour autonomous runs and hundreds of tool calls in a single chain. Persistent memory is the new differentiator, models that use notes, logs, and stored context across a task that spans days.

In plain wordsAn assistant drafts the email. An agent finds the contact, drafts it, checks the calendar, and books the meeting while you review the result, not the process. If you want work done, not just words, this is the dial.
Long-horizon runsTool chainingPersistent memoryComputer use
07
Safety, governance & lock-in What the compliance lead asks
+

The least glamorous parameter quietly decides the most. Does the model retain your prompts, and for how long? The most capable tiers are starting to require 30-day data retention with no zero-retention option, even for enterprises that previously negotiated one. Do safety classifiers reroute or refuse some requests, and how often? Can you fine-tune, or are you frozen on the provider's roadmap? Per-token API pricing is a form of lock-in; open weights are insurance against a provider changing prices or deprecating a model you depend on.

In plain wordsRead the data-retention clause before the benchmark table. The strongest model in the world is the wrong choice if its terms break your audit. This is the slide that gets a deal killed in legal.
Data retentionClassifier rerouteFine-tune rightsVendor lock-in
Strike on the non-negotiables first. Then compete on capability. A team that scores models before eliminating them ends up buying the cleverest model that fails their own rules.
The Atlas

Every model
worth knowing.

The board, laid out. Eight families, five tiers, and the open-weight challengers rewriting the price floor. Filter by what you need. Figures are mid-2026 and move fast; treat them as a map, not a contract.

Tier colors: Mythos-class   Frontier   Value / fast   Open weight

Side by side

The numbers, on one screen.

Highlighted cells mark the current leader in that column. No single model owns the table. That is the whole point.

SWE-Bench Pro

The coding exam

It hands a model real, unsolved bug reports from actual open-source software and checks whether its fix makes the project's own test suite pass. A score of 80% means it correctly resolved 80 of 100 real engineering tickets, the closest thing to "can it do a junior engineer's day job." The catch: scores swing hard with the scaffold around the model. The same model scores differently inside a purpose-built coding harness than in a raw setup, so read coding numbers as directional, never absolute.

GPQA Diamond

The reasoning exam

"Graduate-level Google-Proof Q&A." PhD-level science questions written so you cannot simply search the answer. It measures genuine multi-step reasoning, not recall or memorized facts. The catch: the frontier now clusters in the mid-90s, which means the test is nearly saturated. When every leader scores 94 to 95, the benchmark has stopped telling you who is actually better. Treat a near-perfect GPQA as table stakes, not a tiebreaker.

Other names you will see: Terminal-Bench (can it operate a real command line), FrontierMath (the hardest unsolved math, still far from saturated), GDPval (economically valuable knowledge work), and human-preference arenas for writing and tone. No single score tells the whole story, which is exactly why the table below has more than one column.

Column leader StrongPrices = USD per 1M tokens (input / output)
ModelMakerReleasedTierCoding (SWE-Bench Pro)Reasoning (GPQA)ContextPrice in / outBest at
GPT-5.6 SolOpenAIJun 26 2026Frontierleader*~95%1.5M$5 / $30Agentic coding, cyber, biology
GPT-5.6 TerraOpenAIJun 26 2026Frontierstronghigh-90s1.5M$2.50 / $15GPT-5.5 class at half the cost
GPT-5.6 LunaOpenAIJun 26 2026Valuegoodhigh1.5M$1 / $6Fast, cheap, high-volume
Claude Fable 5AnthropicJun 9 2026Mythos~80%~94%1M$10 / $50Long-horizon agents, hardest work
Claude Opus 4.8AnthropicMay 28 2026Frontier~69%~94%1M$5 / $25Coding, high-stakes writing
Claude Sonnet 4.6AnthropicMar 2026Value~58%high-80s1M$3 / $15Near-Opus quality at value price
GPT-5.5OpenAIApr 23 2026Frontier~59%~95%1M$5 / $30All-round knowledge work, research
Gemini 3.1 ProGoogleearly 2026Frontier~54%~94%1M+$2 / $12Multimodal, long context, value
Gemini 3.5 FlashGoogle2026Valuegoodhigh1M$1.50 / $9Best price-per-intelligence
Grok 4.3xAIApr 17 2026Frontier~55%competitive2M$2 / $15Live data, real-time web/X search
DeepSeek V4-ProDeepSeekApr 24 2026Open~58%strong1M$0.27 / $1.10Frontier-ish quality, lowest cost
DeepSeek V4-FlashDeepSeekApr 24 2026Opengoodsolid1M$0.14 / $0.28Cheapest 1M-context model
Llama 4Meta2025Opengoodsolid10M*self-hostSelf-host, data never leaves
GLM-5.2Z.AIJun 16 2026Open~58%~91%200Kself-hostOpen-weight reasoning leader
Qwen 3.7 MaxAlibaba2026Openstronghigh256K$1.25 / $3.75Cheapest top-10 reasoner, math
Kimi K2.7MoonshotJun 12 2026Open~59%solid256Kself-hostLong tool-call chains, agents
MiniMax M3MiniMax2026Open~59%solid1M$0.60Cheapest self-host frontier coder
Mistral Large 3MistralDec 2025Opengoodsolid256K$0.50 / $1.50EU sovereign, Apache 2.0, on-prem
Command A+CohereMay 20 2026Openfairstrong256Kself-hostEnterprise RAG & search, citations
Amazon Nova 2 ProAmazon2026Valuefairsolid300KlowNative to AWS Bedrock, video
Sarvam 105B (Indus)Sarvam AIFeb 2026Openfairsolid128Kself-host22 Indian languages, sovereign

*GPT-5.6 Sol leads Terminal-Bench 2.1 (command-line agentic coding) at 91.9% in Ultra mode, edging Claude Mythos 5; its SWE-Bench Pro figure was not broken out at preview. Sol, Terra, and Luna launched June 26 2026 under a US-government-coordinated limited preview, broad availability expected within weeks. Llama 4 Scout advertises up to 10M tokens. Coding figures use SWE-Bench Pro where available; scaffolding changes scores materially, so read them as directional. Pricing, benchmarks, and dates verified late June 2026 and change frequently. Confirm against provider docs before production.

Five seats, five questions

The same choice
looks different
from each chair.

A foundation model is bought by a committee that does not share a vocabulary. Here is what each seat is really asking, and the model traits that answer it. Hover a card.

CEO
Does this move the business?
  • Frame as a portfolio, not a vendor bet
  • Ask for spend-per-task, not per-token
  • Insist on a routing strategy in the plan
  • Avoid single-model lock-in to one provider
CFO
What does it really cost?
  • Model fan-out before signing any cap
  • Route volume to value tiers, hard work up
  • Open weights remove per-token lock-in
  • Watch long-context price jumps
CTO
Will it ship and scale?
  • Capability is per task type, not global
  • Agentic runs & tool chains for real work
  • Benchmark scores depend on scaffold
  • Latency for users, throughput for batch
CIO
Does it fit the estate?
  • Cloud alignment: Bedrock, Vertex, Foundry
  • One gateway to route across models
  • Plan for model deprecation cycles
  • Blend hosted + self-host by workload
Compliance
Can I defend it in audit?
  • Read the data-retention clause first
  • Top tiers may force 30-day retention
  • Know the classifier reroute rate
  • Self-host for data residency mandates
Map the model to the job

Task first.
Brand last.

The fastest way to a defensible choice is to start from the workflow and work backwards. A few common enterprise jobs and where the strength sits today.

Function / workflowWhat it needs mostLead choices todayValue alternative
Software engineering & migrationsAgentic coding, long runsGPT-5.6 Sol, Claude Fable 5DeepSeek V4-Pro, Sonnet 4.6
Financial modeling & analysisStep-by-step reasoningGPT-5.6 Sol, Opus 4.8Gemini 3.1 Pro
Legal redlines & contract reviewLong context, careful toneClaude Opus 4.8, Fable 5Gemini 3.1 Pro (1M)
Customer support at scaleLow latency, low costGemini 3.5 Flash, Haiku 4.5DeepSeek V4-Flash
Market & competitive researchMulti-step, live dataGPT-5.6, Grok 4.3Gemini 3.1 Pro + search
Board materials & long-form writingProse rhythm, subtextClaude Opus 4.8GPT-5.5, Sonnet 4.6
Document-heavy / multimodal opsVision, video, audioGemini 3.1 ProGemini 3.5 Flash
High-volume first draftsCheap, fast, good-enoughDeepSeek V4-FlashGPT-5.4 mini, Haiku 4.5
Regulated / air-gapped workloadsData never leavesLlama 4, Qwen 3.5 (self-host)GLM-5.2, DeepSeek (self-host)
Indian-language & sovereign service22 languages, local infraSarvam 105B (Indus)Krutrim, BharatGen Param 2
Beyond text

Models don't only write.
They draw, film,
and speak.

Three more markets, each a real buying decision with its own leaders and its own compliance traps. Verified mid-2026, and faster-moving than any other corner of this guide.

A text model is one purchase. The enterprise that stops there misses three more whole markets, each with its own leaders, prices, and compliance traps. Here is the full board for image, video, and voice, the models behind the products your teams are already signing up for.

Image generation & vision
GPT Image 2
OpenAI · replaced DALL-E
Best overall default. World knowledge and complex-prompt fidelity; multilingual text in images.
Best at: realistic publishing, complex prompts
Imagen 4
Google · Vertex AI
Photorealism leader, especially human faces and natural scenes. Google-native workflow.
Best at: photorealistic humans & nature
Nano Banana Pro
Google · Gemini Image
Strong editing and character consistency across generations. Reliable Google infrastructure.
Best at: editing, consistent characters
Midjourney V8
Midjourney
The aesthetic-quality king. Distinctive, art-directed look. No real API; web and Discord.
Best at: stylized, artistic imagery
Open
FLUX.2
Black Forest Labs
Open-weight champion. Top-tier photorealism, skin and lighting; unmatched fine-tune ecosystem.
Best at: photorealism, custom fine-tunes
CN
Seedream 4.5
ByteDance
Renders text better than almost anything, native 4K, excels at product and commercial looks.
Best at: product shots, text, 4K
Ideogram 3
Ideogram
The typography specialist. If you need readable text, logos, or posters, it is unmatched.
Best at: text in images, logos, posters
Safe
Adobe Firefly 4
Adobe
Commercially safe, trained on licensed data, Photoshop-native. The brand-workflow choice.
Best at: commercial safety, brand workflow
Recraft V4
Recraft
Brand-asset powerhouse: vectors, batch style consistency, logo integration. MCP support.
Best at: vectors, brand asset systems
Open
Qwen Image 2
Alibaba
Open-source value, custom LoRA training. Strong multilingual and Asian-script rendering.
Best at: open-weight value, custom training
Video generation
Veo 3.1
Google · Flow
Best all-rounder. The only model doing one-pass 48kHz lip-synced dialogue. 4K, cinematic.
Best at: spoken dialogue, cinematic clips
CN
Kling 3.0
Kuaishou
Native 4K, up to 2-min clips, AI Director shot control. Best hand rendering. Data in China.
Best at: 4K social, long clips, value
CN
Seedance 2.0
ByteDance
Tops the arena with audio. Flexible input: images, clips, audio per generation. Safe default.
Best at: top quality, flexible inputs
Runway Gen-4.5
Runway
The production workstation. Motion brush, camera control, reference-driven consistency.
Best at: creative control, ad workflows
Luma Ray3
Luma AI
The only HDR option, color-managed pipelines. Atmospheric, environment-heavy image-to-video.
Best at: HDR, cinematic mood
CN
Hailuo 2.3
MiniMax
Most output per dollar. Expressive human motion and faces. Note an active IP lawsuit.
Best at: cheap volume, human subjects
Sunsetting
Sora 2
OpenAI
Best physics simulation, but the app retired and the API shuts down Sep 2026. Do not build new.
Best at: physics realism (migrate off)
Open
Wan 2.7
Alibaba
The serious open-weight video slot. Self-host for custom pipelines and full data control.
Best at: self-hosted, custom pipelines
PixVerse V4.5
PixVerse
The anime and stylized specialist. Handles non-photoreal styles others cannot.
Best at: anime, stylized motion
HeyGen / Synthesia
Avatar tools
Avatar-first: talking heads, corporate training, multilingual lip-sync localization.
Best at: avatars, training, localization
Voice, speech & retrieval
ElevenLabs
ElevenLabs
The production standard for text-to-speech and voice agents. Broadest language and voice library.
Best at: production TTS, voice agents
Open
Voxtral TTS
Mistral · 4B
First credible open TTS at production quality. Beats ElevenLabs in most blind tests. Voice cloning from 3s.
Best at: open-weight voice, cloning
Cohere Transcribe
Cohere
Enterprise-grade speech-to-text. Pairs with Cohere's RAG stack for call and meeting workflows.
Best at: enterprise transcription
Open
Saaras V3
Sarvam · India
Speech across many Indian languages. The voice layer for South Asian sovereign deployments.
Best at: Indian-language speech
Cohere Embed & Rerank
Cohere · retrieval
Not generators, the plumbing of enterprise search. They decide which documents the model even sees.
Best at: RAG accuracy, search relevance
Open
Tiny Aya
Cohere · 70+ langs
3.35B open models in regional variants, runs offline on a laptop. Strongest small multilingual story.
Best at: offline, edge, 70+ languages

The two compliance traps in generative media

Data residency: the strongest video models from Kling, Hailuo, and Seedance process your prompts and assets on servers in China. Fine for personal creative work, a problem for client work under NDA or sensitive brand content. IP indemnification: most paid plans grant commercial rights, but only Adobe Firefly will legally cover you if an output is claimed to infringe. For brand-facing work, that distinction decides the vendor.

The third option

Buy, self-host,
or build your own.

Most guides present a binary: rent a proprietary API, or self-host an open model. There is a third path that matters most to regulated industries, and it is new. Train a frontier-grade model on your own data. Mistral's Forge platform supports the full training lifecycle, pre-training, post-training, and reinforcement learning, on a company's internal datasets, going well beyond fine-tuning. An insurer can train a model from scratch on its own claims and contracts. Early adopters include ASML, Ericsson, and the European Space Agency. Cohere's Model Vault deploys inside your own private cloud so sensitive data never leaves the network.

The build-vs-buy ladder

Prompt the model and you change nothing. RAG feeds it your documents at query time. Fine-tune nudges a small slice of the weights toward your domain. Full custom training (Forge-style) builds the model around your data from the ground up. Cost and control rise at every rung. Most enterprises never need the top rung, but the ones with proprietary data and a hard residency mandate increasingly do, and it is no longer science fiction to reach for it.

The Decision Compass

Answer four questions.
Get a defensible pick.

Not a verdict, a starting shortlist you can take into the room. It eliminates on your non-negotiables first, exactly as you should.

Build your shortlist

Tap one option per row. The pick updates live.

1 · The primary job
Coding & agents
Reasoning & analysis
Writing & comms
High-volume support
Multimodal / documents
2 · Can data leave your infrastructure?
Yes, API is fine
No, must self-host
3 · Budget posture
Strict, cost rules
Moderate
Pay for the best
4 · Language & region focus
Global / English
Indian languages
Needs live web data
Your shortlist
Pick an option in each row →
The compass strikes models that fail your non-negotiables, then ranks what survives against your primary job.
The new top tier

What is this
whole Mythos?

Mythos is a class, not a model. In April 2026 Anthropic introduced a tier that sits above its Opus line, with capabilities it judged too strong to put in everyone's hands at once. The first member, Claude Mythos Preview, went out to roughly fifty vetted cyber-defenders and infrastructure providers through a program called Project Glasswing, run in collaboration with the US government. It was never offered to the public.

Then on June 9, the tier reached everyone, through a clever split. Mythos and Fable are the same underlying model. The difference is the wrapper. Mythos 5 is the raw model with safeguards lifted in some areas, still reserved for Glasswing partners and trusted defenders. Fable 5 is that same model made safe for general use: safety classifiers watch for high-risk requests in cybersecurity, biology, chemistry, and model distillation, and quietly reroute those to the safer Opus 4.8. Anthropic expects that to touch under 5% of sessions.

Mythos is the raw model.
Fable is its version made safe for the public.

The name tells the strategy. A myth is the dangerous original. A fable is the version with a moral attached, the one you can hand to anyone. For a buyer, the practical fact is that Anthropic's lineup now spans five tiers, and that creates a real routing decision rather than a single default.

Haiku
Fast and cheap. High-volume, latency-sensitive work where good-enough wins.
value · fastest
Sonnet
The everyday workhorse. Near-flagship quality at a fraction of flagship price.
$3 / $15
Opus
Complex work that does not need long-horizon stamina. Also the safety fallback for Fable.
$15 / $75
Fable
Mythos-class, public. The longer and harder the task, the larger its lead grows. Built-in classifiers.
$10 / $50
Mythos
Same model, safeguards lifted. Glasswing partners only. Strongest cyber capability of any model.
restricted

The compliance footnote that matters

Mythos-class traffic carries a mandatory 30-day data retention policy, with no zero-retention option, even for enterprises that previously negotiated one. Anthropic says the data is not used for training, only to catch novel jailbreaks and reduce false positives. If you hold a zero-retention agreement for regulatory reasons, it does not apply to Fable or Mythos. Factor it into your data review before routing anything sensitive through the top tier.

There is one more wrinkle a buyer should track. Access to Mythos-class models has become entangled with export policy. The same US authority that gates advanced chips has issued directives touching this tier, and access has been adjusted in response to export-control rules. The lesson is not the specific directive. It is that the most powerful models now sit close enough to national security that geopolitics, not just price, can decide what you are allowed to run.

This is now a pattern, not a one-off

On June 26, 2026, OpenAI launched its next frontier family, GPT-5.6 Sol, Terra, and Luna, and shipped it the same way: a limited preview to roughly 20 government-cleared partners, while US agencies run a security review of up to 30 days under a June executive order. Sol tops the agentic-coding benchmarks, edging Claude Mythos 5, with a 1.5M-token context window. The takeaway for a buyer is structural, not about one lab: the very top tier from both leaders now ships through a government access gate first. EU, UK, India, and APAC teams could not touch GPT-5.6 on normal tiers at launch. Plan procurement around the broadly-available tier (Terra, Opus, Fable) and treat frontier access as a roadmap item, not a given.

What comes next

The map is
about to redraw.

If Mythos and Fable show where the frontier is heading, four forces will decide who reaches it, and from where. The next models will not just be smarter. They will be sovereign, specialized, and shaped as much by policy as by research.

Force 01

Tiers above the tier

Mythos opened a class above Opus. Expect every major lab to ship a "too powerful for default release" tier, gated by safeguards, trusted-access programs, and government partnership. The frontier becomes a velvet rope, and capability is metered by trust earned, not money paid.

Force 02

The price floor falls out

Open weights now deliver near-frontier quality at a fraction of the cost. A task that bills fifteen dollars on a flagship can cost cents on an open model. This does not just save money. It changes what is economically worth automating, which pulls AI into industries that could never justify the price before.

Force 03

Specialized over general

The next wave is vertical. Smaller models tuned for one domain, one language, one regulatory context, beating giant generalists on their home turf. Procurement stops asking "which model is best" and starts asking "which model is best at my Tuesday-morning job."

Force 04

Sovereignty as strategy

Nations and regions are building their own models so intelligence does not arrive only through someone else's API. Europe's answer is Mistral, Apache 2.0 open weights plus a Forge platform to train private models, backed by 13,800 GPUs near Paris and Cohere's merger with Germany's Aleph Alpha into a transatlantic sovereign stack. The contest is shifting from raw IQ to ecosystem control: who owns the infrastructure, who sets the default rails, whose chips you depend on. Geopolitics is now a model parameter.

The world map

Eight regions.
Every nation wants
its own model.

The frontier is no longer one zip code in California. In February 2026 over a hundred nations signed the Bangkok Declaration committing to AI sovereignty. The reasons repeat everywhere: a global model does not natively understand local dialects, legal frameworks, or culture; sending citizen data to a foreign API raises law and security questions; and intelligence is now infrastructure, like electricity, that nations do not want to rent forever. Here is the world a buyer actually chooses from.

North America

The frontier

Sets the ceiling. English-first.

Still the capability peak: the Mythos and Opus tiers, GPT-5.5, Gemini 3.1. The strategy is closed, API-first, and roadmap-driven. Canada plays a quieter "Switzerland of AI" role: talent-rich and neutral, home to Cohere's enterprise-RAG stack.

Anthropic Fable/OpusOpenAI GPT-5.5Google Gemini 3.1Meta Llama 4Cohere Command (CA)
China

The open surge

Open weights, frugal cost.

The most crowded open-weight ecosystem on earth, and it reset the global price floor. DeepSeek crossed 80% on coding benchmarks with downloadable weights; GLM, Qwen, and Kimi trade the open-source lead month to month. Strong on Asian languages and cultural nuance. The catch for buyers: hosted APIs route through servers in China, so regulated work means self-hosting.

DeepSeek V4Alibaba Qwen 3.7Z.AI GLM-5.2Moonshot Kimi K2.7Tencent HunyuanXiaomi MiMo
Europe

Regulated & sovereign

Open weights, EU data law.

The regulation-first bloc, shaped by the EU AI Act. France's Mistral is the clear champion: Apache 2.0 open weights, a Forge platform to train private models, and 13,800 GPUs going live near Paris. Cohere's merger with Germany's Aleph Alpha created a transatlantic sovereign stack; Switzerland proved a fully in-house national model is feasible.

Mistral Large 3 (FR)Aleph Alpha (DE)Cohere EU stackSwiss national LLM
Middle East

Compute as oil

Arabic-first, capital-rich.

The Gulf is buying its way to the frontier, treating compute as the new oil. Abu Dhabi's Falcon scales to 180B with permissive licensing; Jais delivers strong bilingual Arabic-English with dialect switching; Saudi Arabia's ALLaM powers the national HUMAIN Chat assistant with deep Arabic cultural nuance.

Falcon (UAE)Jais bilingualALLaM (Saudi)
South Asia

The frugal stack

22 languages, public rails.

India is not waiting for the frontier to be handed down. Sarvam open-sourced a 30B and a 105B model on government compute under the IndiaAI Mission; the flagship ships as the Indus chatbot, fluent in 22 Indian languages and a founding member of NVIDIA's Nemotron coalition. The edge is not raw scale but frugal architecture plus Digital Public Infrastructure that reaches a billion people through rails that already exist.

Sarvam Indus 105BKrutrim (Ola)BharatGen Param 2CoRover BharatGPT
East Asia

Hardware & language

Korean, Japanese fluency.

Backed by a Korean government sovereign-AI fund, SK Telecom's A.X processes Korean a third more efficiently than Western models; Upstage's Solar Pro packs frontier performance into a compact 31B. Japan focuses on Japanese-language fluency through models like Fujitsu's Takane and lightweight on-prem options.

SK Telecom A.X (KR)Upstage Solar ProFujitsu Takane (JP)
Latin America

A public good

Spanish & Portuguese.

Launched February 2026 by Chile's CENIA with 60+ institutions across 15 countries, Latam-GPT is the region's first collaborative model, around 50B parameters trained on 8TB of Spanish, Portuguese, and regional data: Buenos Aires court rulings, Colombian textbooks, Peruvian library records. Built for $550K, it is a public good for citizen services and education, with indigenous languages planned.

Latam-GPT (Chile/CENIA)15-country coalition
Africa & SE Asia

Lightweight inclusion

Low-compute, local languages.

Inclusion-first and built for constraint. Africa's InkubaLM is a compact 0.4B model spanning Hausa, Swahili, isiXhosa, isiZulu, and Yoruba; Kenya's UlizaLlama delivers Swahili health services. Singapore's SEA-LION covers Southeast Asian languages. These prove a model does not need to be huge to matter where no giant ever bothered to look.

InkubaLM (Africa)UlizaLlama SwahiliSEA-LION (SG)
Sources: provider disclosures, regional launches & the Bangkok Declaration, 2026

What this means for a buyer

For broad capability today, the North American frontier still leads and is available now. But for legal analysis in a regional language, government service automation, healthcare in vernacular tongues, or any workload that cannot sit on foreign cloud, a sovereign or regional model is no longer a compromise. The smartest 2026 architecture blends them: a global frontier model for the hardest reasoning, a regional model for language and residency, routed by the job.

The next great models will not all speak English first. Some are being built to speak to billions who were never spoken to before.
A buyer's horizon

What to watch, and when

NOW · 2026

Routing is the skill

The teams that win are not the ones with the single smartest model. They are the ones who route each job to the model that leads it, and cap spend with a gateway. Build that muscle first.

NEAR · 12 months

Tiers split further

Expect more "above-the-flagship" classes gated by trusted access, and more value models that erase the quality gap on routine work. The middle gets crowded; the top gets exclusive.

MID · 2027

Sovereign goes mainstream

National and regional models reach enterprise-grade for language and residency workloads. Procurement checklists start asking where the model was trained and whose hardware it ran on.

FAR · beyond

Policy is a parameter

Export controls, retention mandates, and security gating decide access to the very top as much as capability or price. The compliance lead becomes the most important seat in the room.

Do not buy the smartest model.
Buy the one you can route, afford, and defend, and keep the freedom to change your mind.

Global AI Forum · The Buyer's Atlas · Edition 01