- Pipecat just added first-class support for Smallest AI.
- Lightning TTS (sub-100ms TTFA, 15 languages) and Pulse STT (sub-70ms TTFT, 36 languages) now plug straight into your voice agent pipeline — no custom adapter.
- Here's what that actually unlocks for latency-sensitive voice bots.
TL;DR
Smallest AI is now a natively supported provider in Pipecat — the open-source Python framework for real-time voice and multimodal AI agents. Two services drop in together: Lightning TTS (text-to-speech, <100ms time-to-first-audio) and Pulse STT (speech-to-text, <70ms time-to-first-transcript). Install with uv add "pipecat-ai[smallest]", set SMALLEST_API_KEY, and you're wired into any existing Pipecat pipeline without writing a custom adapter.
What's new
Until this week, plugging Smallest AI into Pipecat meant rolling your own WebSocket adapter or living with one of the incumbent TTS/STT providers (Cartesia, ElevenLabs, Deepgram, Rime). That's over. The official Pipecat docs now ship a SmallestTTSService class with two model options:
lightning-v3.1— the default, streamlined for production voice agentslightning-v2— exposes extra fine-tuning knobs:consistency,similarity, andenhancementon a 0–1 range
Pulse STT rides in on the same integration, meaning you can run the full STT → LLM → TTS loop on Smallest's stack if you want a single vendor for the non-LLM links.
Why it matters
Voice agents live and die on latency. The widely-cited budget for a natural-feeling conversation is roughly 800ms time-to-first-audio: 50ms VAD + 150ms STT + 400ms LLM first-token + 150ms TTS first chunk + 50ms network overhead. Miss that and the bot feels like a slow IVR.
Lightning TTS delivers first audio in under 100ms — that's 50ms of headroom against the TTS budget line. Pulse STT clocks sub-70ms TTFT, giving you another 80ms of slack in the STT allocation. That combined 130ms is exactly where you want it: bank it against the LLM, which is the hardest component to compress without sacrificing response quality.
Technical facts
The numbers that matter when you're specc'ing a pipeline:
| Metric | Lightning TTS (v3.1) | Pulse STT |
|---|---|---|
| Latency | <100ms time-to-first-audio | <70ms time-to-first-transcript |
| Real-time factor | 0.01 (10s audio in ~100ms) | Streaming |
| Languages | 15 (EN, ES, HI, TA, FR, DE, IT, PT, SV, NL + 5 Indic) | 36, with auto-detect + code-switching |
| Audio formats | PCM, MP3, WAV, mulaw | — |
| Voice cloning | From 3s of source audio | — |
| Concurrency | 20+ streams per endpoint | Streaming-first |
| Transport | WebSocket + auto-reconnect | WebSocket streaming |
Pulse STT also handles speaker diarization, emotion recognition (happy/sad/angry/fear/disgust), profanity filtering, and word boosting out of the box — features you'd otherwise stitch from two or three services.
Minimal Pipecat example
from pipecat.services.smallest import SmallestTTSService
from pipecat.transcriptions.language import Language
import os
tts = SmallestTTSService(
api_key=os.getenv("SMALLEST_API_KEY"),
settings=SmallestTTSService.Settings(
voice="sophia",
language=Language.ES,
speed=1.2,
),
)That's it. Drop tts into your existing Pipeline([...]), keep your VAD, STT, LLM, and transport as-is, and the rest is identical to any other Pipecat TTS service. You can even hot-swap voice, speed, or language mid-call with a TTSUpdateSettingsFrame — no restart.
How it stacks up
Smallest AI's published benchmarks have Lightning averaging a 4.14 MOS versus ElevenLabs at 3.83. In a blind listening test across 1,088 samples in English, Hindi, Spanish, and Tamil, listeners preferred Lightning v3.1 over OpenAI's gpt-4o-mini-tts 76.2% of the time.
Versus the incumbents you're likely already using in Pipecat:
- vs Cartesia / ElevenLabs: Smallest's edge is latency floor and Indic-language coverage (Hindi, Tamil, Telugu, Malayalam, Kannada, Marathi, Gujarati). If you're building for the Indian market or any code-mixed English/Hindi call center, this is the first plug-and-play option in Pipecat that was designed for it.
- vs Deepgram / AssemblyAI for STT: Smallest's Pulse page claims lower WER than both, backed by their own benchmarks. Validate with your own audio before committing — accuracy is notoriously dataset-dependent — but sub-70ms TTFT is hard to beat on paper.
Who should care
- Teams already on Pipecat hitting latency ceilings with their current TTS/STT combo
- Voice agents serving India or multilingual audiences — code-mixing mid-sentence is a first-class feature, not a hack
- Anyone wanting to consolidate STT + TTS with a single vendor for billing, compliance, and support
- Solo devs on the free tier: $10 in credits is plenty to prototype a working agent end-to-end
Limitations & pricing
A few things worth knowing before you swap providers:
- Lightning v3.1 is the default but does not expose the
consistency,similarity, orenhancementcontrols that v2 still offers. If you need those knobs, pinlightning-v2in your service config. - Lightning v3.2 — with direct emotion and pace control — is on the roadmap, no firm date.
- Pulse STT pricing is not publicly listed; enterprise contact required for volume deals.
- Compliance is covered: SOC 2 Type II, HIPAA, PCI, GDPR, ISO, and a 99.99% uptime SLA on the enterprise tier.
- Pipecat's public STT docs for Smallest were still rolling out at announcement time — TTS is fully documented; for STT, check the latest Pipecat release notes if you need the Pulse service class.
What's next
The interesting second-order effect here is that Pipecat's service roster is becoming less of a bottleneck for picking fast-moving TTS/STT vendors. Smallest's native landing is one more signal that voice-agent infrastructure has matured into a commodity layer — pick your latency target, pick your language coverage, swap providers in a single config line.
If you're running a production Pipecat pipeline, the cheapest experiment you can run this week is: clone your current agent, swap the TTS line to SmallestTTSService, and A/B measure TTFA over a few hundred real calls. The 50–130ms of latency headroom is real, and it compounds with whatever else you do to tighten the LLM leg.
Sources: Smallest AI announcement, Pipecat SmallestTTSService docs, Lightning TTS product page, Pulse STT product page, Voice assistant latency budget.
