TL;DR

Smallest AI is now a natively supported provider in Pipecat — the open-source Python framework for real-time voice and multimodal AI agents. Two services drop in together: Lightning TTS (text-to-speech, <100ms time-to-first-audio) and Pulse STT (speech-to-text, <70ms time-to-first-transcript). Install with uv add "pipecat-ai[smallest]", set SMALLEST_API_KEY, and you're wired into any existing Pipecat pipeline without writing a custom adapter.

What's new

Until this week, plugging Smallest AI into Pipecat meant rolling your own WebSocket adapter or living with one of the incumbent TTS/STT providers (Cartesia, ElevenLabs, Deepgram, Rime). That's over. The official Pipecat docs now ship a SmallestTTSService class with two model options:

  • lightning-v3.1 — the default, streamlined for production voice agents
  • lightning-v2 — exposes extra fine-tuning knobs: consistency, similarity, and enhancement on a 0–1 range

Pulse STT rides in on the same integration, meaning you can run the full STT → LLM → TTS loop on Smallest's stack if you want a single vendor for the non-LLM links.

Why it matters

Voice agents live and die on latency. The widely-cited budget for a natural-feeling conversation is roughly 800ms time-to-first-audio: 50ms VAD + 150ms STT + 400ms LLM first-token + 150ms TTS first chunk + 50ms network overhead. Miss that and the bot feels like a slow IVR.

Lightning TTS delivers first audio in under 100ms — that's 50ms of headroom against the TTS budget line. Pulse STT clocks sub-70ms TTFT, giving you another 80ms of slack in the STT allocation. That combined 130ms is exactly where you want it: bank it against the LLM, which is the hardest component to compress without sacrificing response quality.

Technical facts

The numbers that matter when you're specc'ing a pipeline:

MetricLightning TTS (v3.1)Pulse STT
Latency<100ms time-to-first-audio<70ms time-to-first-transcript
Real-time factor0.01 (10s audio in ~100ms)Streaming
Languages15 (EN, ES, HI, TA, FR, DE, IT, PT, SV, NL + 5 Indic)36, with auto-detect + code-switching
Audio formatsPCM, MP3, WAV, mulaw
Voice cloningFrom 3s of source audio
Concurrency20+ streams per endpointStreaming-first
TransportWebSocket + auto-reconnectWebSocket streaming

Pulse STT also handles speaker diarization, emotion recognition (happy/sad/angry/fear/disgust), profanity filtering, and word boosting out of the box — features you'd otherwise stitch from two or three services.

Minimal Pipecat example

from pipecat.services.smallest import SmallestTTSService
from pipecat.transcriptions.language import Language
import os

tts = SmallestTTSService(
    api_key=os.getenv("SMALLEST_API_KEY"),
    settings=SmallestTTSService.Settings(
        voice="sophia",
        language=Language.ES,
        speed=1.2,
    ),
)

That's it. Drop tts into your existing Pipeline([...]), keep your VAD, STT, LLM, and transport as-is, and the rest is identical to any other Pipecat TTS service. You can even hot-swap voice, speed, or language mid-call with a TTSUpdateSettingsFrame — no restart.

How it stacks up

Smallest AI's published benchmarks have Lightning averaging a 4.14 MOS versus ElevenLabs at 3.83. In a blind listening test across 1,088 samples in English, Hindi, Spanish, and Tamil, listeners preferred Lightning v3.1 over OpenAI's gpt-4o-mini-tts 76.2% of the time.

Versus the incumbents you're likely already using in Pipecat:

  • vs Cartesia / ElevenLabs: Smallest's edge is latency floor and Indic-language coverage (Hindi, Tamil, Telugu, Malayalam, Kannada, Marathi, Gujarati). If you're building for the Indian market or any code-mixed English/Hindi call center, this is the first plug-and-play option in Pipecat that was designed for it.
  • vs Deepgram / AssemblyAI for STT: Smallest's Pulse page claims lower WER than both, backed by their own benchmarks. Validate with your own audio before committing — accuracy is notoriously dataset-dependent — but sub-70ms TTFT is hard to beat on paper.

Who should care

  • Teams already on Pipecat hitting latency ceilings with their current TTS/STT combo
  • Voice agents serving India or multilingual audiences — code-mixing mid-sentence is a first-class feature, not a hack
  • Anyone wanting to consolidate STT + TTS with a single vendor for billing, compliance, and support
  • Solo devs on the free tier: $10 in credits is plenty to prototype a working agent end-to-end

Limitations & pricing

A few things worth knowing before you swap providers:

  • Lightning v3.1 is the default but does not expose the consistency, similarity, or enhancement controls that v2 still offers. If you need those knobs, pin lightning-v2 in your service config.
  • Lightning v3.2 — with direct emotion and pace control — is on the roadmap, no firm date.
  • Pulse STT pricing is not publicly listed; enterprise contact required for volume deals.
  • Compliance is covered: SOC 2 Type II, HIPAA, PCI, GDPR, ISO, and a 99.99% uptime SLA on the enterprise tier.
  • Pipecat's public STT docs for Smallest were still rolling out at announcement time — TTS is fully documented; for STT, check the latest Pipecat release notes if you need the Pulse service class.

What's next

The interesting second-order effect here is that Pipecat's service roster is becoming less of a bottleneck for picking fast-moving TTS/STT vendors. Smallest's native landing is one more signal that voice-agent infrastructure has matured into a commodity layer — pick your latency target, pick your language coverage, swap providers in a single config line.

If you're running a production Pipecat pipeline, the cheapest experiment you can run this week is: clone your current agent, swap the TTS line to SmallestTTSService, and A/B measure TTFA over a few hundred real calls. The 50–130ms of latency headroom is real, and it compounds with whatever else you do to tighten the LLM leg.

Sources: Smallest AI announcement, Pipecat SmallestTTSService docs, Lightning TTS product page, Pulse STT product page, Voice assistant latency budget.