- ElevenLabs shipped a Voice Isolator Skill.
- Run `npx skills add elevenlabs/skills` and any Claude-compatible coding agent can strip background noise from audio via the Audio Isolation API — no boilerplate, no SDK wiring.
TL;DR
ElevenLabs Devs just introduced the Voice Isolator Skill — a drop-in capability for Claude Code, Cursor, and any agent runtime that speaks the open Agent Skills spec. One command — npx skills add elevenlabs/skills — and your agent knows how to call ElevenLabs' Audio Isolation API to strip background noise from any audio file: podcast, interview, voice note, meeting recording.

What's new
The Voice Isolator Skill is the newest entry in the elevenlabs/skills repo, which now covers text-to-speech, speech-to-text, Agents, sound effects, music, setup-api-key, and voice-isolator. Each one is a folder with a SKILL.md that teaches the agent exactly how to hit the matching ElevenLabs endpoint — auth, multipart uploads, supported formats, error handling — without the developer writing that glue.
The announcement came from @ElevenLabsDevs on X: “Give your agent what it needs to build voice isolation into your app. Remove background noise and unwanted sounds from any audio.”
Why it matters
Before skills, integrating Voice Isolator meant reading the API docs, writing a multipart upload wrapper, figuring out how to stream the response, and handling the 500 MB / 1-hour limits. With the skill installed, your agent reads the SKILL.md, picks the right endpoint, and writes the integration code for you — often first-try correct, because the instructions were authored by the vendor, not remembered from a stale training set.
It's the same shift that happened when package managers replaced “copy this snippet from the README” — except now the consumer is an LLM, and the “package” is a set of instructions written for that LLM. Context stays cheap too: only the name and description from SKILL.md's frontmatter sit in the agent's window by default. The full instructions load lazily, only when the agent actually needs them.
For teams shipping audio features, that removes an entire class of “the model hallucinated a method name that doesn't exist” bugs. The skill is the source of truth the agent reads before writing code.
Technical facts

| Property | Value |
|---|---|
| Install | npx skills add elevenlabs/skills |
| Endpoint | audioIsolation.convert & audioIsolation.stream |
| Audio formats | WAV, MP3, FLAC, OGG, AAC, AIFF, OPUS, M4A |
| Video formats | MP4, MOV, MKV, AVI, WMV, FLV, WEBM, MPEG, 3GPP |
| Max file size | 500 MB |
| Max duration | 1 hour |
| Cost | 1,000 characters / minute of audio |
| Auth | ELEVENLABS_API_KEY env var |
| SDKs | Python, JavaScript/TypeScript, REST |
Under the hood: neural speech separation trained to pull vocal signal out of mixed audio with minimal artifacts. The skill targets the same production API that powers elevenlabs.io/voice-isolator.
Comparison
| Dimension | Voice Isolator Skill | Raw Audio Isolation API | Adobe Podcast / Krisp |
|---|---|---|---|
| Install surface | One command | Hand-written wrapper | UI or SDK |
| Agent-native | Yes | No | No |
| Programmatic | Yes | Yes | Partial |
| Streaming | Yes | Yes | Varies |
| Pricing model | 1k chars / min | Same | Subscription |
The skill's differentiator vs. the raw API is zero boilerplate for agents. Vs. a SaaS cleaner like Adobe Podcast: it's composable — you embed it into your own product rather than ask users to leave your app.
Use cases

- Podcast & interview pipelines — strip HVAC hum, traffic, keyboard clicks before publishing.
- Video post-production — clean dialogue tracks from location shoots with noisy backgrounds.
- UGC mobile apps — voice notes, voicemail, field recordings rendered listenable.
- Dataset prep — clean training audio before ASR, diarization, or fine-tuning.
- Meeting & call recordings — crisp speech in front of a summarization or search pipeline.
- Agentic workflows — “clean up this audio” becomes a native tool your Claude Code session can call.
Limitations & pricing
- Cost — 1,000 characters per minute. A 1-hour file ≈ 60k characters (~$1.65 on a Creator tier at roughly $22 per 100k).
- Hard limits — 500 MB per file, 1 hour per request.
- Not a music stem splitter — optimized for speech-in-noise, not vocals-from-full-mix. Results on musical backing vary.
- API key required — skill expects
ELEVENLABS_API_KEY. - Quality depends on input SNR — very low-volume speech under loud noise will still clip.
What's next
Expect the elevenlabs/skills repo to keep filling in: deeper Agents skill, more cookbooks, and streaming-first variants once latency gets interesting. The Audio Isolation API itself already exposes a stream endpoint, so real-time cleanup — think live podcast recorders or in-browser meeting tools — is a natural next demo.
More broadly, the Skills spec is becoming a distribution channel for API providers — the same way npm is for libraries. If you ship an API today, “publish a skill” is quickly becoming as expected as “publish a TypeScript SDK.” ElevenLabs is one of the first big vendors to treat it that way, and the shape of the elevenlabs/skills repo — one folder per capability, each with a focused SKILL.md — is a reasonable template for the rest of the industry to copy.
For developers: if you're already on ElevenLabs, installing the skill is a one-minute experiment. Your next audio feature might write itself.
Nguồn: elevenlabs/skills, ElevenLabs docs, @ElevenLabsDevs.
