Skip to main content

Overview

ElevenLabs provides high-quality text-to-speech synthesis with two service implementations:
  • ElevenLabsTTSService (WebSocket) — Real-time streaming with word-level timestamps, audio context management, and interruption handling. Recommended for interactive applications.
  • ElevenLabsHttpTTSService (HTTP) — Simpler batch-style synthesis. Suitable for non-interactive use cases or when WebSocket connections are not possible.

ElevenLabs TTS API Reference

Complete API reference for all parameters and methods

Example Implementation

Complete example with WebSocket streaming

ElevenLabs Documentation

Official ElevenLabs TTS API documentation

Voice Library

Browse and clone voices from the community

Installation

uv add "pipecat-ai[elevenlabs]"

Prerequisites

  1. ElevenLabs Account: Sign up at ElevenLabs
  2. API Key: Generate an API key from your account dashboard
  3. Voice Selection: Choose voice IDs from the voice library
Set the following environment variable:
export ELEVENLABS_API_KEY=your_api_key

Configuration

ElevenLabsTTSService

api_key
str
required
ElevenLabs API key.
voice_id
str
required
deprecated
Voice ID from the voice library. Deprecated in v0.0.105. Use settings=ElevenLabsTTSService.Settings(voice=...) instead.
model
str
default:"eleven_turbo_v2_5"
deprecated
ElevenLabs model ID. Use a multilingual model variant (e.g. eleven_multilingual_v2) if you need non-English language support. Deprecated in v0.0.105. Use settings=ElevenLabsTTSService.Settings(model=...) instead.
url
str
default:"wss://api.elevenlabs.io"
WebSocket endpoint URL. Override for custom or proxied deployments.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
auto_mode
bool
default:"None"
Whether to enable ElevenLabs’ auto mode, which reduces latency by disabling server-side chunk scheduling and buffering. Recommended when sending complete sentences or phrases. When None (default), auto mode is automatically enabled for SENTENCE aggregation and disabled for TOKEN aggregation — because token streaming relies on the server-side chunk scheduler to accumulate enough text for natural-sounding synthesis.
text_aggregation_mode
TextAggregationMode
default:"TextAggregationMode.SENTENCE"
Controls how incoming text is aggregated before synthesis. SENTENCE (default) buffers text until sentence boundaries, producing more natural speech. TOKEN streams tokens directly for lower latency. Import from pipecat.services.tts_service.
aggregate_sentences
bool
default:"None"
deprecated
Deprecated in v0.0.104. Use text_aggregation_mode instead.
params
InputParams
default:"None"
deprecated
Deprecated in v0.0.105. Use settings=ElevenLabsTTSService.Settings(...) instead.
settings
ElevenLabsTTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

ElevenLabsHttpTTSService

The HTTP service accepts the same parameters as the WebSocket service, with these differences:
aiohttp_session
aiohttp.ClientSession
required
An aiohttp session for HTTP requests. You must create and manage this yourself.
base_url
str
default:"https://api.elevenlabs.io"
HTTP API base URL (instead of url for WebSocket).
enable_logging
bool
default:"None"
Whether to enable ElevenLabs server-side logging. Set to False for zero retention mode (enterprise only).
The HTTP service uses ElevenLabsHttpTTSSettings which also includes:
optimize_streaming_latency
int
default:"None"
Latency optimization level (0–4). Higher values reduce latency at the cost of quality.

Settings

Runtime-configurable settings passed via the settings constructor argument using ElevenLabsTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneElevenLabs model identifier. (Inherited from base settings.)
voicestrNoneVoice identifier. (Inherited from base settings.)
languageLanguage | strNoneLanguage code. Only effective with multilingual models. (Inherited from base settings.)
stabilityfloatNOT_GIVENVoice consistency (0.0–1.0). Lower values are more expressive, higher values are more consistent.
similarity_boostfloatNOT_GIVENVoice clarity and similarity to the original (0.0–1.0).
stylefloatNOT_GIVENStyle exaggeration (0.0–1.0). Higher values amplify the voice’s style.
use_speaker_boostboolNOT_GIVENEnhance clarity and target speaker similarity.
speedfloatNOT_GIVENSpeech rate. WebSocket: 0.7–1.2. HTTP: 0.25–4.0.
apply_text_normalizationLiteralNOT_GIVENText normalization: "auto", "on", or "off".
NOT_GIVEN values use the ElevenLabs API defaults. See ElevenLabs voice settings for details on how these parameters interact.

Usage

Basic Setup

from pipecat.services.elevenlabs import ElevenLabsTTSService

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    settings=ElevenLabsTTSService.Settings(
        voice="21m00Tcm4TlvDq8ikWAM",  # Rachel
    ),
)

With Voice Customization

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    settings=ElevenLabsTTSService.Settings(
        voice="21m00Tcm4TlvDq8ikWAM",
        model="eleven_multilingual_v2",
        language=Language.ES,
        stability=0.7,
        similarity_boost=0.8,
        speed=1.1,
    ),
)

Updating Settings at Runtime

Voice settings can be changed mid-conversation using TTSUpdateSettingsFrame:
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.elevenlabs.tts import ElevenLabsTTSSettings

await task.queue_frame(
    TTSUpdateSettingsFrame(
        delta=ElevenLabsTTSSettings(
            stability=0.3,
            speed=1.1,
        )
    )
)

HTTP Service

import aiohttp
from pipecat.services.elevenlabs import ElevenLabsHttpTTSService

async with aiohttp.ClientSession() as session:
    tts = ElevenLabsHttpTTSService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        settings=ElevenLabsHttpTTSService.Settings(
            voice="21m00Tcm4TlvDq8ikWAM",
        ),
        aiohttp_session=session,
    )
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Multilingual models required for language: Setting language with a non-multilingual model (e.g. eleven_turbo_v2_5) has no effect. Use eleven_multilingual_v2 or similar.
  • WebSocket vs HTTP: The WebSocket service supports word-level timestamps and interruption handling, making it significantly better for interactive conversations. The HTTP service is simpler but lacks these features.
  • Text aggregation: Sentence aggregation is enabled by default (text_aggregation_mode=TextAggregationMode.SENTENCE). Buffering until sentence boundaries produces more natural speech. Set text_aggregation_mode=TextAggregationMode.TOKEN to stream tokens directly for lower latency. The auto_mode parameter is automatically configured based on the aggregation mode for optimal quality.

Event Handlers

ElevenLabs TTS supports the standard service connection events:
EventDescription
on_connectedConnected to ElevenLabs WebSocket
on_disconnectedDisconnected from ElevenLabs WebSocket
on_connection_errorWebSocket connection error occurred
@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to ElevenLabs")