Gemma 4
Google's open model family — released 2 April 2026 under a fully permissive Apache 2.0 licence, with versions that run on everything from a phone to a workstation.
Gemma 4 specs at a glance
| Maker | |
| Released | 2 April 2026 |
| Licence | Apache 2.0 — fully permissive, open weights |
| Base architecture | Built on Gemini 3 |
| Sizes | E2B (2B) · E4B (4B) · 26B MoE · 31B dense |
| Modalities | Text + images (all sizes); audio (E2B, E4B) |
| Languages | 140+ |
| Best at | Local / self-hosted deployment, edge & mobile inference |
The four Gemma 4 model sizes
Gemma 4 is a family, not a single model. Each size targets a different deployment footprint, so you pick the model that fits your hardware rather than the other way round.
| Variant | Parameters | Runs on |
|---|---|---|
| E2B | 2B | Phones (text, image, audio) |
| E4B | 4B | Edge devices (text, image, audio) |
| 26B MoE | 26B total · ~4B active | Consumer GPUs |
| 31B dense | 31B | Workstations |
Key takeaway
The 26B Mixture-of-Experts variant delivers about 97% of the 31B dense model's performance while activating only ~4B parameters per inference — making frontier-level quality practical on a single consumer GPU.
The Apache 2.0 licence
The licence change is the headline. Gemma 4 is the first Google open model released under Apache 2.0 — a fully permissive licence with no monthly-active-user caps, no acceptable-use policy enforced by the model creator, and complete freedom to modify, redistribute and commercialise. Earlier Gemma releases carried Google's custom terms; Apache 2.0 removes the legal friction that kept some companies away, which makes Gemma 4 a genuine watershed for the open model ecosystem.
Gemma 4 benchmarks
For its size, Gemma 4 punches well above its weight — the 31B dense model competes with systems many times larger.
| Benchmark | Gemma 4 (31B dense) | What it measures |
|---|---|---|
| AIME 2026 | 89.2% | Competition mathematics |
| LiveCodeBench | 80.0% | Real-world coding tasks |
| MMLU Pro | 85.2% | Broad expert-level knowledge |
| Arena AI | #3 ranking | Human-preference leaderboard |
Architecture
Gemma 4 is built on the Gemini 3 architecture and uses a hybrid attention mechanism that alternates between local sliding-window attention (512-1024 tokens) and global full-context attention, plus per-layer embeddings for deeper representation. The 26B variant adds a Mixture-of-Experts optimisation, activating only ~4B of its parameters per token — the trick that lets a large model run cheaply.
Who should use Gemma 4
- Developers who want to self-host — open Apache 2.0 weights, no API fees, full control.
- Edge and mobile builders — E2B and E4B run on phones and edge hardware, with audio support.
- Cost-conscious teams with a single GPU — the 26B MoE variant brings near-frontier quality to consumer hardware.
- Multilingual products — 140+ language coverage is among the broadest of any open model.
If you need the absolute frontier of capability or native video reasoning, Google's hosted Gemini 3.1 Ultra is the stronger choice — Gemma 4's trade is portability and licence freedom, not peak performance.
How Gemma 4 compares
- Gemini 3.1 Ultra — Google's hosted flagship →
- DeepSeek V4 — the other major open-weights family →
- GPT-5.5 ("Spud") overview →
Frequently asked questions
When was Gemma 4 released?
Google released Gemma 4 on 2 April 2026. It is Google's first fully permissive open model, shipped under the Apache 2.0 licence.
What sizes does Gemma 4 come in?
Gemma 4 comes in four sizes: E2B (2B parameters, for phones), E4B (4B, for edge devices), a 26B Mixture-of-Experts variant (about 4B active, for consumer GPUs), and a 31B dense model for workstations.
Is Gemma 4 free for commercial use?
Yes. Gemma 4 is released under the Apache 2.0 licence with no monthly active user caps, no acceptable-use restrictions, and full freedom to modify, redistribute and commercialise.
How good is Gemma 4?
The 31B dense Gemma 4 model scores 89.2% on AIME 2026, 80.0% on LiveCodeBench and 85.2% on MMLU Pro, and ranks #3 on Arena AI — competitive with models many times its size.
Can Gemma 4 run on a single GPU?
Yes. Gemma 4's 26B Mixture-of-Experts variant activates only about 4B parameters per inference and fits on a single consumer GPU, while the smaller E2B and E4B models run on phones and edge devices.
How many languages does Gemma 4 support?
Gemma 4 supports over 140 languages, making it one of the most multilingual open models available.