TL;DR

Google DeepMind shipped Gemma 4 on April 2, 2026 under Apache 2.0 — four sizes from 2.3B (E2B, runs in <1.5GB RAM) to 31B dense. Pair it with Ollama + Continue in VS Code and you get chat, inline edit, and tab-autocomplete that rivals GitHub Copilot. Your code never leaves the machine, it works offline, and it costs $0 forever. Copilot Pro is $10/mo, Pro+ is $39/mo — and new signups are paused from April 20, 2026. The timing writes itself.

Gemma 4 31B hits 1452 on Arena AI, topping open-model leaderboards

What's new

Gemma 4 is not a marginal upgrade. On coding benchmarks the 31B model jumps from Gemma 3 27B's 29.1% on LiveCodeBench v6 to 80.0%. Codeforces ELO goes from 110 to 2150 — roughly grandmaster territory. Four sizes ship on day one:

  • E2B — 2.3B effective params, runs in <1.5GB via LiteRT with 2/4-bit weights. Phones, Raspberry Pi 5, old laptops.
  • E4B — 4.5B effective. Still edge-class, better quality.
  • 26B A4B — Mixture-of-Experts, 3.8B active. Fast inference on consumer GPUs.
  • 31B Dense — max quality. Fits on one 80GB H100 in bfloat16.

All sizes support a configurable thinking mode, native function calling, structured JSON output, and context windows up to 256K tokens — enough to load most repos. Day-0 integrations cover Ollama, llama.cpp, MLX, vLLM, LM Studio, LiteRT-LM, transformers.js.

Why this rivals Copilot

Until now, the honest answer to "should I use Copilot?" was yes — the UX and context-awareness beat every local setup. Gemma 4 changes the math on four axes:

  • Price: Apache 2.0, free commercial use. Copilot costs $10–$39/user/month.
  • Privacy: code stays on the laptop. No prompts, diffs, or snippets uploaded to anyone.
  • Offline: works on planes, trains, conference wifi, or inside air-gapped enterprise networks.
  • Hardware floor: E2B in 1.5GB of RAM means even a five-year-old laptop runs it.

GitHub itself paused new signups for Copilot Pro, Pro+, and the student plan starting April 20, 2026. So for a lot of developers, the question right now isn't "Gemma 4 or Copilot" — it's "Gemma 4 or nothing."

Technical facts

The coding-relevant numbers (vs Gemma 3 27B baseline):

Benchmark31B26B A4BE4BE2BGemma 3 27B
LiveCodeBench v680.0%77.1%52.0%44.0%29.1%
Codeforces ELO21501718940633110
Arena AI (text)145214411365
MMLU Pro85.2%82.6%69.4%60.0%67.6%
τ2-bench (agentic)86.4%85.5%57.5%29.4%6.6%

Gemma 4 benchmark results across MMLU, AIME, LiveCodeBench, Codeforces

Speed on real hardware: Raspberry Pi 5 CPU gets 133 prefill / 7.6 decode tokens/s. A Qualcomm Dragonwing IQ8 NPU hits 3,700 prefill / 31 decode tokens/s. LiteRT-LM runs 4,000 tokens across 2 skills in under 3 seconds on mobile.

Gemma 4 vs GitHub Copilot

Gemma 4 (local)GitHub Copilot
PriceFree, Apache 2.0Pro $10 / Pro+ $39 / Business $19 / Enterprise $39 per month
Where code livesYour machineMicrosoft / OpenAI servers
OfflineYesNo
Min hardwareE2B: ~1.5GB RAMAny internet-connected device
Context window128K (E2B/E4B), 256K (26B/31B)Varies by tier
Autocomplete UXGood via Continue; Copilot still faster & more context-aware across big reposBest in class
Chat, refactor, explainCompetitiveCompetitive
Agentic tool useNative function calling + JSON outputAgent mode (premium)
SignupPull weightsPro / Pro+ new signups paused since Apr 20, 2026

Honest take: Copilot still wins on raw completion speed and cross-workspace context awareness in monster repos. Where Gemma 4 closes the gap hardest is the chat-based flow — explaining code, refactoring, debugging from a stack trace, generating tests. That covers most of what people actually pay Copilot for.

Use cases

  • Offline development — planes, trains, rural wifi, sensitive client sites.
  • Regulated industries — finance, healthcare, government, defense: code provably never leaves the device.
  • Indie hackers — drop a $10–$39/mo subscription. That's $120–$468 a year back in the pocket.
  • Old hardware — E2B on a 2019 laptop or Raspberry Pi 5 gives real autocomplete for the first time.
  • One-shot scaffolding — "Build me a landing page for X" prompts produce full HTML/CSS/JS in one go.
  • Local agents — native function calling + structured JSON makes it a clean LLM backend for custom agent frameworks.

Gemma 4 running locally on an iPad — fully offline AI assistant

Limitations & setup

It's not all sunshine. Smaller sizes trade a lot of quality: E2B scores 44% on LiveCodeBench vs 80% on the 31B. If you want grandmaster-tier suggestions you need a real GPU (or a beefy Apple Silicon machine — 24GB unified memory handles 26B comfortably). Training cutoff is January 2025, so very new library APIs may be missing — supply them in the prompt.

Setup is a one-time lift: install Ollama, run ollama pull gemma4, install the Continue extension in VS Code, and point it at the local model. The official VS Code integration guide covers the wiring in a few minutes.

What's next

Google's AICore Developer Preview on Android hints at forward-compatibility with Gemini Nano 4, meaning the same on-device pipeline will upgrade later this year. The Gemma 4 Good Challenge on Kaggle will surface community-tuned variants — expect coding-specialist finetunes in the Gemmaverse within weeks.

For now the headline is simple: a free, open, on-device model just became good enough at code that paying monthly for the privilege is no longer the default choice.

Sources: blog.google, Google DeepMind, Gemma 4 model card, Hugging Face, GitHub Copilot plans.