Transports
Transports exchange audio and video streams between the user and bot.| Service | Setup |
|---|---|
| DailyTransport | uv add "pipecat-ai[daily]" |
| FastAPIWebSocketTransport | uv add "pipecat-ai[websocket]" |
| HeyGenTransport | uv add "pipecat-ai[heygen]" |
| LemonSliceTransport | uv add "pipecat-ai[lemonslice]" |
| LiveKitTransport | uv add "pipecat-ai[livekit]" |
| SmallWebRTCTransport | uv add "pipecat-ai[webrtc]" |
| TavusTransport | uv add "pipecat-ai[tavus]" |
| WebSocket Transports | uv add "pipecat-ai[websocket]" |
| WhatsAppTransport | uv add "pipecat-ai[webrtc]" |
Serializers
Serializers convert between frames and media streams, enabling real-time communication over a websocket.Speech-to-Text
Speech-to-Text services receive and audio input and output transcriptions.| Service | Setup |
|---|---|
| AssemblyAI | uv add "pipecat-ai[assemblyai]" |
| AWS Transcribe | uv add "pipecat-ai[aws]" |
| Azure | uv add "pipecat-ai[azure]" |
| Cartesia | uv add "pipecat-ai[cartesia]" |
| Deepgram | uv add "pipecat-ai[deepgram]" |
| ElevenLabs | uv add "pipecat-ai[elevenlabs]" |
| Fal Wizper | uv add "pipecat-ai[fal]" |
| Gladia | uv add "pipecat-ai[gladia]" |
uv add "pipecat-ai[google]" | |
| Gradium | uv add "pipecat-ai[gradium]" |
| Groq (Whisper) | uv add "pipecat-ai[groq]" |
| NVIDIA | uv add "pipecat-ai[nvidia]" |
| OpenAI | uv add "pipecat-ai[openai]" |
| Sarvam | uv add "pipecat-ai[sarvam]" |
| Smallest | uv add "pipecat-ai[smallest]" |
| Soniox | uv add "pipecat-ai[soniox]" |
| Speechmatics | uv add "pipecat-ai[speechmatics]" |
| Whisper | uv add "pipecat-ai[whisper]" |
Large Language Models
LLMs receive text or audio based input and output a streaming text response.| Service | Setup |
|---|---|
| Anthropic | uv add "pipecat-ai[anthropic]" |
| AWS Bedrock | uv add "pipecat-ai[aws]" |
| Azure | uv add "pipecat-ai[azure]" |
| Cerebras | uv add "pipecat-ai[cerebras]" |
| DeepSeek | uv add "pipecat-ai[deepseek]" |
| Fireworks AI | uv add "pipecat-ai[fireworks]" |
| Google Gemini | uv add "pipecat-ai[google]" |
| Google Vertex AI | uv add "pipecat-ai[google]" |
| Grok | uv add "pipecat-ai[grok]" |
| Groq | uv add "pipecat-ai[groq]" |
| Mistral | uv add "pipecat-ai[mistral]" |
| Nebius | uv add "pipecat-ai[nebius]" |
| Novita AI | uv add "pipecat-ai[novita]" |
| NVIDIA | uv add "pipecat-ai[nvidia]" |
| Ollama | uv add "pipecat-ai[ollama]" |
| OpenAI | uv add "pipecat-ai[openai]" |
| OpenAI Responses | uv add "pipecat-ai[openai]" |
| OpenRouter | uv add "pipecat-ai[openrouter]" |
| Perplexity | uv add "pipecat-ai[perplexity]" |
| Qwen | uv add "pipecat-ai[qwen]" |
| SambaNova | uv add "pipecat-ai[sambanova]" |
| Sarvam | uv add "pipecat-ai[sarvam]" |
| Together AI | uv add "pipecat-ai[together]" |
Text-to-Speech
Text-to-Speech services receive text input and output audio streams or chunks.| Service | Setup |
|---|---|
| Async | uv add "pipecat-ai[asyncai]" |
| AWS Polly | uv add "pipecat-ai[aws]" |
| Azure | uv add "pipecat-ai[azure]" |
| Camb AI | uv add "pipecat-ai[camb]" |
| Cartesia | uv add "pipecat-ai[cartesia]" |
| Deepgram | uv add "pipecat-ai[deepgram]" |
| ElevenLabs | uv add "pipecat-ai[elevenlabs]" |
| Fish | uv add "pipecat-ai[fish]" |
uv add "pipecat-ai[google]" | |
| Gradium | uv add "pipecat-ai[gradium]" |
| Groq | uv add "pipecat-ai[groq]" |
| Hume | uv add "pipecat-ai[hume]" |
| Inworld | No dependencies required |
| Kokoro | uv add "pipecat-ai[kokoro]" |
| LMNT | uv add "pipecat-ai[lmnt]" |
| MiniMax | No dependencies required |
| Mistral | uv add "pipecat-ai[mistral]" |
| Neuphonic | uv add "pipecat-ai[neuphonic]" |
| NVIDIA | uv add "pipecat-ai[nvidia]" |
| OpenAI | uv add "pipecat-ai[openai]" |
| Piper | No dependencies required |
| ResembleAI | uv add "pipecat-ai[resemble]" |
| Rime | uv add "pipecat-ai[rime]" |
| Sarvam | No dependencies required |
| Smallest AI | uv add "pipecat-ai[smallest]" |
| Speechmatics | uv add "pipecat-ai[speechmatics]" |
| xAI | uv add "pipecat-ai[xai]" |
| XTTS | uv add "pipecat-ai[xtts]" |
Speech-to-Speech
Speech-to-Speech services are multi-modal LLM services that take in audio, video, or text and output audio or text.| Service | Setup |
|---|---|
| AWS Nova Sonic | uv add "pipecat-ai[aws-nova-sonic]" |
| Gemini Live | uv add "pipecat-ai[google]" |
| Gemini Live Vertex AI | uv add "pipecat-ai[google]" |
| Grok Voice Agent | uv add "pipecat-ai[grok]" |
| Inworld Realtime | uv add "pipecat-ai[inworld]" |
| OpenAI Realtime | uv add "pipecat-ai[openai]" |
| Ultravox | uv add "pipecat-ai[ultravox]" |
Image Generation
Image generation services receive text inputs and output images.Video
Video services enable you to build an avatar where audio and video are synchronized.Memory
Memory services can be used to store and retrieve conversations.| Service | Setup |
|---|---|
| mem0 | uv add "pipecat-ai[mem0]" |
Vision
Vision services receive a streaming video input and output text describing the video input.| Service | Setup |
|---|---|
| Moondream | uv add "pipecat-ai[moondream]" |
Analytics & Monitoring
Analytics services help you better understand how your service operates.| Service | Setup |
|---|---|
| Sentry | uv add "pipecat-ai[sentry]" |