TL;DR

Parallel published a walkthrough for a fully free local agent stack: Pi (Mario Zechner's minimal terminal coding harness) drives Gemma 4 on local Ollama, and the brand-new Parallel Search MCP gives the agent the open web — with no API key required. The deliverable is brief, a one-file CLI that takes a topic and prints a morning-coffee summary with sources. $0 in API charges. Zero keys in your shell history.

Parallel Search MCP — the free web brain for the local agent

What's new

Three pieces, none of which existed in this combination a few weeks ago, finally clicked together:

  • Pi (@mariozechner/pi-coding-agent) — a deliberately tiny terminal coding harness with four built-in tools (read, write, edit, bash) and MCP support via a third-party pi-mcp-adapter.
  • Gemma 4 on Ollama — Google DeepMind's Apache-2.0 family, with edge variants that fit a laptop and a 26B Mixture-of-Experts that activates only 3.8B parameters per token.
  • Parallel Search MCP at https://search.parallel.ai/mcp — a free, anonymous endpoint exposing web_search and web_fetch to any MCP-aware client.

Why it matters

For two years the working assumption was: serious agents need a frontier model and a paid search API. Both halves were just relaxed. Gemma 4 is good enough at planning and summarization to be useful on a laptop, and the Parallel Search MCP removes the last credit card from the loop. That changes the unit economics for every cron job, daily-brief, research script, and learn-by-doing tutorial — they collapse from cents-per-run to literally free.

Technical facts

ComponentWhat it isCost
PiTerminal harness, 4 built-in tools, MCP via adapter$0 (npm, Apache-2.0)
OllamaLocal model runtime$0 (open source)
gemma4:e4bEdge variant for runtime summarization, fits ~8 GB RAM$0 (Apache-2.0 weights)
gemma4:26b26B MoE / 3.8B active — used for code generation while building the CLI$0 (Apache-2.0 weights)
Parallel Search MCPweb_search + web_fetch, anonymous, no key$0 free tier

One subtle architectural choice in the demo: the CLI code orchestrates the MCP call, then passes results to the LLM as plain text. The model never tool-calls at runtime. That sidesteps every quirk small open models still have around multi-step JSON tool loops — a pragmatic trick worth stealing.

Comparison

CapabilityHosted stack (Claude/GPT + Tavily/Exa)This stack (Pi + Gemma 4 + Parallel MCP)
Inference cost$/M tokens$0, on local GPU/CPU
Search cost$5–$30 per 1k queries typical$0, no key on free tier
PrivacyPrompts leave the boxPrompt + history stay on the laptop
Offline modeNoneInference offline; only search needs network
SetupAPI keys + billingbrew install ollama && ollama pull gemma4:e4b

Use cases

  • Daily news briefbrief "Gemma 4 launch" returns a paragraph plus a source list. Drop it in cron.
  • Privacy-sensitive drafting — legal, medical, internal product specs. The body never leaves the laptop; only the search query goes out.
  • Cron-job summaries — release notes, competitor blogs, pricing pages. Cents-per-run becomes free.
  • Teaching — a clean reference implementation for students learning how an agent harness, an LLM, and an MCP server actually wire together.

Limitations & pricing

  • Free tier rate limits on Parallel Search MCP are unpublished but described as fit for “exploration and light use”. Hit the cap and you can drop in an x-api-key for higher allowances; OAuth is available via /mcp-oauth.
  • Hardware: gemma4:e4b is happy on an 8 GB laptop. gemma4:26b wants ≥16 GB unified memory or a mid-range GPU.
  • Tool-calling fidelity: small open models still trip on multi-step tool loops. The post sidesteps this by orchestrating from CLI code, not the LLM. Full agentic loops with Gemma 4 driving still need careful prompting.
  • Search depth: free tier is for one-shot lookups, not Deep Research multi-hop. That's a paid Parallel tier.

What's next

The Pi adapter ecosystem is moving fast — expect filesystem, git, and GitHub MCP servers to slot into the same harness, plus tighter Gemma 4 fine-tunes that make the LLM, not the CLI, drive the loop. The bigger story: a credible fully free baseline now exists for hobbyist agents. The interesting question stops being “which API do I pay for” and becomes “what would I build if every run cost $0?”

Nguồn: parallel.ai, Parallel Search MCP announcement, Parallel docs, Ollama gemma4, @p0 on X.