4 model sizes:

  • E2B / E4B - Edge models, run on-device (a Raspberry Pi or your phone, <1.5GB RAM)

  • 26B A4B MoE - only 3.8B params active at inference

  • 31B Dense


Specs from the model card[1]

The "E" in E2B and E4B stands for "effective" parameters
The "E" in E2B and E4B stands for "effective" parameters

The "A" in 26B A4B stands for "active parameters
The "A" in 26B A4B stands for "active parameters

Gemma 4 benchmarks (31B unless noted)

Reasoning
• MMLU Pro (massive multitask language understanding): 85.2%
• GPQA Diamond (graduate-level science Q&A): 84.3%
• AIME 2026 (American math olympiad): 89.2%
• MMMLU (multilingual MMLU, 140+ languages): 88.4%

Coding
• LiveCodeBench v6 (real-world coding problems): 80.0%
• Codeforces ELO (competitive programming rating): 2150

Vision
• MMMU Pro (college-level multimodal reasoning): 76.9%
• MATH-Vision (math problems with visual context): 85.6%

Audio (E2B /E4B only)
• FLEURS WER: 0.08
• CoVoST (speech-to-text translation): 35.54


When to use which

Scenario

Best choice

Cloud, quality is the priority

31B Dense

Cloud, you're serving millions of requests and cost matters

26B MoE (only 3.8B params active at inference — cheaper per token)

On-device / mobile / edge

E2B or E4B

Audio transcription

E2B or E4B (only these two support audio)

All these models are released under the Apache 2.0 license.

Footnotes