TL;DR
Parallel published a walkthrough for a fully free local agent stack: Pi (Mario Zechner's minimal terminal coding harness) drives Gemma 4 on local Ollama, and the brand-new Parallel Search MCP gives the agent the open web — with no API key required. The deliverable is brief, a one-file CLI that takes a topic and prints a morning-coffee summary with sources. $0 in API charges. Zero keys in your shell history.

What's new
Three pieces, none of which existed in this combination a few weeks ago, finally clicked together:
- Pi (
@mariozechner/pi-coding-agent) — a deliberately tiny terminal coding harness with four built-in tools (read,write,edit,bash) and MCP support via a third-partypi-mcp-adapter. - Gemma 4 on Ollama — Google DeepMind's Apache-2.0 family, with edge variants that fit a laptop and a 26B Mixture-of-Experts that activates only 3.8B parameters per token.
- Parallel Search MCP at
https://search.parallel.ai/mcp— a free, anonymous endpoint exposingweb_searchandweb_fetchto any MCP-aware client.
Why it matters
For two years the working assumption was: serious agents need a frontier model and a paid search API. Both halves were just relaxed. Gemma 4 is good enough at planning and summarization to be useful on a laptop, and the Parallel Search MCP removes the last credit card from the loop. That changes the unit economics for every cron job, daily-brief, research script, and learn-by-doing tutorial — they collapse from cents-per-run to literally free.
Technical facts
| Component | What it is | Cost |
|---|---|---|
| Pi | Terminal harness, 4 built-in tools, MCP via adapter | $0 (npm, Apache-2.0) |
| Ollama | Local model runtime | $0 (open source) |
gemma4:e4b | Edge variant for runtime summarization, fits ~8 GB RAM | $0 (Apache-2.0 weights) |
gemma4:26b | 26B MoE / 3.8B active — used for code generation while building the CLI | $0 (Apache-2.0 weights) |
| Parallel Search MCP | web_search + web_fetch, anonymous, no key | $0 free tier |
One subtle architectural choice in the demo: the CLI code orchestrates the MCP call, then passes results to the LLM as plain text. The model never tool-calls at runtime. That sidesteps every quirk small open models still have around multi-step JSON tool loops — a pragmatic trick worth stealing.
Comparison
| Capability | Hosted stack (Claude/GPT + Tavily/Exa) | This stack (Pi + Gemma 4 + Parallel MCP) |
|---|---|---|
| Inference cost | $/M tokens | $0, on local GPU/CPU |
| Search cost | $5–$30 per 1k queries typical | $0, no key on free tier |
| Privacy | Prompts leave the box | Prompt + history stay on the laptop |
| Offline mode | None | Inference offline; only search needs network |
| Setup | API keys + billing | brew install ollama && ollama pull gemma4:e4b |
Use cases
- Daily news brief —
brief "Gemma 4 launch"returns a paragraph plus a source list. Drop it in cron. - Privacy-sensitive drafting — legal, medical, internal product specs. The body never leaves the laptop; only the search query goes out.
- Cron-job summaries — release notes, competitor blogs, pricing pages. Cents-per-run becomes free.
- Teaching — a clean reference implementation for students learning how an agent harness, an LLM, and an MCP server actually wire together.
Limitations & pricing
- Free tier rate limits on Parallel Search MCP are unpublished but described as fit for “exploration and light use”. Hit the cap and you can drop in an
x-api-keyfor higher allowances; OAuth is available via/mcp-oauth. - Hardware:
gemma4:e4bis happy on an 8 GB laptop.gemma4:26bwants ≥16 GB unified memory or a mid-range GPU. - Tool-calling fidelity: small open models still trip on multi-step tool loops. The post sidesteps this by orchestrating from CLI code, not the LLM. Full agentic loops with Gemma 4 driving still need careful prompting.
- Search depth: free tier is for one-shot lookups, not Deep Research multi-hop. That's a paid Parallel tier.
What's next
The Pi adapter ecosystem is moving fast — expect filesystem, git, and GitHub MCP servers to slot into the same harness, plus tighter Gemma 4 fine-tunes that make the LLM, not the CLI, drive the loop. The bigger story: a credible fully free baseline now exists for hobbyist agents. The interesting question stops being “which API do I pay for” and becomes “what would I build if every run cost $0?”
Nguồn: parallel.ai, Parallel Search MCP announcement, Parallel docs, Ollama gemma4, @p0 on X.

