Free CLI Agent: Pi + Ollama + Gemma 4 + Parallel Search MCP — $0, No API Keys

Summary post

Parallel just published a recipe for a fully free, fully local CLI agent: Mario Zechner's Pi harness, Gemma 4 on Ollama, and Parallel's no-key Search MCP. Result: a one-file `brief` CLI that prints a sourced morning-coffee summary on any topic — at $0 in API charges and zero keys in your shell history.

6phút đọc

8mục nội dung

6chủ đề

TL;DR

Parallel published a walkthrough for a fully free local agent stack: Pi (Mario Zechner's minimal terminal coding harness) drives Gemma 4 on local Ollama, and the brand-new Parallel Search MCP gives the agent the open web — with no API key required. The deliverable is brief, a one-file CLI that takes a topic and prints a morning-coffee summary with sources. $0 in API charges. Zero keys in your shell history.

Parallel Search MCP — the free web brain for the local agent

What's new

Three pieces, none of which existed in this combination a few weeks ago, finally clicked together:

Pi (@mariozechner/pi-coding-agent) — a deliberately tiny terminal coding harness with four built-in tools (read, write, edit, bash) and MCP support via a third-party pi-mcp-adapter.
Gemma 4 on Ollama — Google DeepMind's Apache-2.0 family, with edge variants that fit a laptop and a 26B Mixture-of-Experts that activates only 3.8B parameters per token.
Parallel Search MCP at https://search.parallel.ai/mcp — a free, anonymous endpoint exposing web_search and web_fetch to any MCP-aware client.

Why it matters

For two years the working assumption was: serious agents need a frontier model and a paid search API. Both halves were just relaxed. Gemma 4 is good enough at planning and summarization to be useful on a laptop, and the Parallel Search MCP removes the last credit card from the loop. That changes the unit economics for every cron job, daily-brief, research script, and learn-by-doing tutorial — they collapse from cents-per-run to literally free.

Technical facts

Component	What it is	Cost
Pi	Terminal harness, 4 built-in tools, MCP via adapter	$0 (npm, Apache-2.0)
Ollama	Local model runtime	$0 (open source)
`gemma4:e4b`	Edge variant for runtime summarization, fits ~8 GB RAM	$0 (Apache-2.0 weights)
`gemma4:26b`	26B MoE / 3.8B active — used for code generation while building the CLI	$0 (Apache-2.0 weights)
Parallel Search MCP	`web_search` + `web_fetch`, anonymous, no key	$0 free tier

One subtle architectural choice in the demo: the CLI code orchestrates the MCP call, then passes results to the LLM as plain text. The model never tool-calls at runtime. That sidesteps every quirk small open models still have around multi-step JSON tool loops — a pragmatic trick worth stealing.

Comparison

Capability	Hosted stack (Claude/GPT + Tavily/Exa)	This stack (Pi + Gemma 4 + Parallel MCP)
Inference cost	$/M tokens	$0, on local GPU/CPU
Search cost	$5–$30 per 1k queries typical	$0, no key on free tier
Privacy	Prompts leave the box	Prompt + history stay on the laptop
Offline mode	None	Inference offline; only search needs network
Setup	API keys + billing	`brew install ollama && ollama pull gemma4:e4b`

Use cases

Daily news brief — brief "Gemma 4 launch" returns a paragraph plus a source list. Drop it in cron.
Privacy-sensitive drafting — legal, medical, internal product specs. The body never leaves the laptop; only the search query goes out.
Cron-job summaries — release notes, competitor blogs, pricing pages. Cents-per-run becomes free.
Teaching — a clean reference implementation for students learning how an agent harness, an LLM, and an MCP server actually wire together.

Limitations & pricing

Free tier rate limits on Parallel Search MCP are unpublished but described as fit for “exploration and light use”. Hit the cap and you can drop in an x-api-key for higher allowances; OAuth is available via /mcp-oauth.
Hardware: gemma4:e4b is happy on an 8 GB laptop. gemma4:26b wants ≥16 GB unified memory or a mid-range GPU.
Tool-calling fidelity: small open models still trip on multi-step tool loops. The post sidesteps this by orchestrating from CLI code, not the LLM. Full agentic loops with Gemma 4 driving still need careful prompting.
Search depth: free tier is for one-shot lookups, not Deep Research multi-hop. That's a paid Parallel tier.

What's next

The Pi adapter ecosystem is moving fast — expect filesystem, git, and GitHub MCP servers to slot into the same harness, plus tighter Gemma 4 fine-tunes that make the LLM, not the CLI, drive the loop. The bigger story: a credible fully free baseline now exists for hobbyist agents. The interesting question stops being “which API do I pay for” and becomes “what would I build if every run cost $0?”

Nguồn: parallel.ai, Parallel Search MCP announcement, Parallel docs, Ollama gemma4, @p0 on X.

Free CLI Agent: Pi + Ollama + Gemma 4 + Parallel Search MCP — $0, No API Keys

TL;DR

What's new

Why it matters

Technical facts

Comparison

Use cases

Limitations & pricing

What's next

Tiếp tục lướt

Huihui4-8B-A4B: cắt 96 expert khỏi Gemma 4 mà perplexity vẫn đẹp hơn bản gốc

Carnice-V2-27b: a 27B open-source agent model built on Qwen3.6 lands on Hugging Face

Qwen3.6-27B chạy local trên MacBook Pro: model 27B đánh bại 397B trên benchmark coding