UserBotLatencyObserver measures the time between when a user stops speaking and when the bot starts responding, emitting events for custom handling and optional OpenTelemetry tracing integration. It also tracks first-bot-speech latency and provides detailed per-service latency breakdowns when metrics are enabled.
Features
- Tracks user speech start/stop timing using VAD frames
- Measures bot response latency from the actual moment the user started speaking
- Measures first bot speech latency (client connection to first speech)
- Provides detailed latency breakdown with per-service TTFB, text aggregation, user turn duration, and function call metrics
- Emits
on_latency_measuredevents for custom processing - Emits
on_latency_breakdownevents with detailed per-service metrics - Emits
on_first_bot_speech_latencyevent for greeting latency measurement - Automatically records latency as OpenTelemetry span attributes when tracing is enabled
- Automatically resets between conversation turns
Usage
Basic Latency Monitoring
Add latency monitoring to your pipeline and handle the event:Detailed Latency Breakdown
Enable metrics to collect per-service latency breakdown:OpenTelemetry Integration
When tracing is enabled, latency measurements are automatically recorded asturn.user_bot_latency_seconds attributes on OpenTelemetry turn spans. No additional configuration is needed.
How It Works
The observer tracks conversation flow through these key events:- Client connects (
ClientConnectedFrame) → Records timestamp for first-bot-speech measurement - User starts speaking (
VADUserStartedSpeakingFrame) → Resets latency tracking - User stops speaking (
VADUserStoppedSpeakingFrame) → Records timestamp, accounting for VADstop_secsdelay - Bot starts speaking (
BotStartedSpeakingFrame) → Calculates latency and emitson_latency_measuredandon_latency_breakdownevents
enable_metrics=True in PipelineParams, the observer also collects per-service metrics (TTFB, text aggregation, function call latency) from MetricsFrame instances and includes them in the latency breakdown.
Event Handlers
on_latency_measured
Called each time a user-to-bot latency measurement is captured.on_latency_breakdown
Called alongsideon_latency_measured with detailed per-service metrics collected during the user→bot cycle. The breakdown includes TTFB from each service, text aggregation latency, user turn duration, and function call timings.
| Field | Type | Description |
|---|---|---|
ttfb | List[TTFBBreakdownMetrics] | Time-to-first-byte metrics from each service |
text_aggregation | Optional[TextAggregationBreakdownMetrics] | First text aggregation measurement (sentence aggregation latency) |
user_turn_start_time | Optional[float] | Unix timestamp when user turn started (adjusted for VAD stop_secs) |
user_turn_secs | Optional[float] | User turn duration including VAD silence detection, STT finalization, and turn analyzer wait |
function_calls | List[FunctionCallMetrics] | Latency for each function call executed during the cycle |
breakdown.chronological_events() method returns a human-readable list of all metrics sorted by start time, useful for logging and debugging.
on_first_bot_speech_latency
Called once when the bot first speaks after client connection. Measures the time fromClientConnectedFrame to the first BotStartedSpeakingFrame. This is particularly useful for measuring greeting latency.
The
on_latency_breakdown event is also emitted for the first bot speech,
allowing you to see the detailed breakdown of what contributed to the greeting
latency.Configuration
Constructor Parameters
Maximum number of frame IDs to keep in history for duplicate detection.
Prevents unbounded memory growth in long conversations.
Limitations
- Requires proper frame sequencing to work accurately
- Per-service metrics are only collected when
enable_metrics=TrueinPipelineParams