TL;DR: The frontier went quiet in May after April's sprint — but architecture innovation took center stage. SubQ launched the first commercial subquadratic LLM with a 12M-token context window. OpenAI rolled out GPT-5.5 Instant as the new ChatGPT default. DeepSeek V4 Pro dropped pricing by 75%. Grok 3 hiked 10x. ZAYA1-8B became the first AMD-trained MoE.

The Big Picture: A Breather After April's Sprint

April 2026 was the most competitive month in LLM history. Five labs — OpenAI, Anthropic, DeepSeek, Moonshot, and Xiaomi — all dropped models scoring above 50 on the Intelligence Index. GPT-5.5 broke 60. Opus 4.7 landed. The ceiling moved up three points.

8.0 / 10

Weekly LLM Review 2026

🛡️ AI Tool · Updated 2026

May is different. The frontier leaderboard hasn't changed. Instead, the action shifted sideways into architecture innovation, pricing wars, and product defaults.

This Week's Model Releases

Date	Model	Developer	Type	License
May 5	GPT-5.5 Instant	OpenAI	Text + Reasoning	Proprietary
May 5	SubQ 1M-Preview	Subquadratic	Subquadratic LLM	Proprietary (API)
May 6	ZAYA1-8B	Zyphra	Text + MoE	Apache 2.0
May 6	Grok 4.3	xAI	Text + Reasoning	Proprietary
May 8	Gemini 3.1 Flash Lite	Google	Text + Vision	Proprietary (API)
May 14	Qwen3 Coder Next	Alibaba	Code	Apache 2.0
May 14	MiniMax M2.7 Highspeed	MiniMax	Text	Proprietary (API)

SubQ 1M-Preview: Architecture Takes the Stage

The most significant release this week isn't from a name you know. Subquadratic (SubQ) launched with $29M in seed funding and a bold claim: their model isn't a transformer. Using sparse, subquadratic attention end-to-end, SubQ 1M-Preview ships with a native 12 million token context window — the largest of any commercially available LLM.

Standard transformer attention scales at O(n²). Double the context, quadruple the cost. SubQ's approach claims roughly 1/5 the cost of frontier models on long-context tasks and up to 52x faster attention at scale. If this holds in production, it's not just a new model — it's a new architecture category.

The model is available via API and priced competitively for long-context workloads like repo-wide code analysis, multi-document research, and legal document review. The 12M-token window is real — no quality degradation caveats at the upper end.

GPT-5.5 Instant: The New Default

OpenAI quietly made GPT-5.5 Instant the new default in ChatGPT on May 5. This lightweight, low-latency variant of GPT-5.5 aims for faster responses and reduced hallucinations compared to the standard GPT-5.5. It's not a frontier breakthrough — the Intelligence Index didn't move — but it's what millions of ChatGPT users will experience daily. A faster, cheaper default that still benefits from the GPT-5.5 architecture is a meaningful product improvement even if it doesn't make headlines.

ZAYA1-8B: AMD-Trained Open MoE

Zyphra released ZAYA1-8B on May 6 under Apache 2.0 — a Mixture of Experts model with 8 billion total parameters and approximately 760 million active parameters per token. What makes it notable: it was trained entirely on AMD hardware. This is one of the first major open-weight models that didn't touch NVIDIA GPUs during training, signaling growing maturity in the AMD AI ecosystem.

With Apache 2.0 licensing, ZAYA1-8B is free for any use including commercial deployment. It's designed for efficient self-hosting, with the MoE architecture providing strong per-token compute efficiency.

Pricing Earthquake: Winners and Losers

Model	Before	After (per 1M tokens)	Change
DeepSeek V4 Pro	$1.74/$3.48	$0.44/$0.87	-75%
Mistral Large 3	$2.00/$6.00	$0.50/$1.50	-75%
Claude Haiku 4.5	$0.80/$4.00	$1.00/$5.00	+25% (upgraded)
Gemini 3.1 Pro	Renamed	$2.00/$12.00	Repriced
Grok 3	$3.00/$15.00	$30.00/$150.00	+900%

The biggest story this week is the Grok 3 price hike — a staggering 10x increase to $30/$150 per million tokens, making it more expensive than Claude Opus 4.7. Grok 3 Mini remains available at $3/$5 as a budget alternative.

On the other end, DeepSeek V4 Pro slashed prices by 75%, now at $0.44/$0.87 — competitive with budget-tier models while offering near-premium capabilities. Combined with DeepSeek V4 Flash at $0.14/$0.28, DeepSeek now owns the most competitive pricing spread in the market.

Mistral Large 3 also dropped 75%, from $2/$6 to $0.50/$1.50, strategically repositioning toward the budget segment. The overall LLM API market now has a 675x price range between the cheapest (Gemini 2.0 Flash at $0.10/$0.40) and most expensive (GPT-5.5 Pro at $30/$180) models.

What to Watch

Google's Gemini 2.0 Flash — currently the cheapest model at $0.10/$0.40 — is slated for deprecation by June 2026. Migration will be needed soon. The SubQ architecture will face real production scrutiny in the coming weeks — if it delivers on its efficiency claims, it could reshape how we think about long-context inference. And the labs that didn't ship in April (Anthropic past Opus 4.7, Google past Flash Lite, Meta, Mistral) are likely preparing their next moves.

For more on cost optimization across LLM providers, see our Claude Code cost guide and multi-provider failover build log.

Frequently Asked Questions

What is the cheapest LLM API right now?

Google's Gemini 2.0 Flash at $0.10/$0.40 per 1M tokens is the cheapest from a major provider. However, it's slated for deprecation by June 2026 — migrate soon.

Is SubQ better than GPT-5.5?

Not on raw intelligence. SubQ claims roughly 1/5 frontier capability with its current preview. Its advantage is context length (12M tokens) and cost efficiency — it's a different product for different use cases.

Why did Grok 3 prices increase so much?

xAI has not published a detailed rationale. The 10x increase likely reflects repositioning toward enterprise pricing or capacity management. Grok 3 Mini remains at $3/$5 as a fallback option.

Which models offer the best value right now?

DeepSeek V4 Pro at $0.44/$0.87 offers the best premium-to-price ratio. For budget workloads, Gemini 2.0 Flash ($0.10/$0.40) and DeepSeek V4 Flash ($0.14/$0.28) are the clear winners.

← Back to all posts

Weekly LLM Review — May 15, 2026: SubQ&#x27;s 12M Context, GPT-5.5 Instant, and a Pricing Earthquake

The Big Picture: A Breather After April's Sprint

Weekly LLM Review 2026

This Week's Model Releases

SubQ 1M-Preview: Architecture Takes the Stage

GPT-5.5 Instant: The New Default

ZAYA1-8B: AMD-Trained Open MoE

Pricing Earthquake: Winners and Losers

What to Watch

Frequently Asked Questions

What is the cheapest LLM API right now?

Is SubQ better than GPT-5.5?

Why did Grok 3 prices increase so much?

Which models offer the best value right now?

Related Posts

Claude Fable 5 Review 2026: Anthropic's Mythos-Class Model, Tested

Tabnine Review 2026: Privacy-First AI Coding Assistant for Enterprise

Stable Diffusion Review 2026: Open-Source AI Image Generator

You.com AI Search Review 2026 — Privacy-First AI Search That Actually Competes

Weekly LLM Review — May 15, 2026: SubQ's 12M Context, GPT-5.5 Instant, and a Pricing Earthquake