Skip to main content

Overview

Smart Turn Detection uses an advanced machine learning model to determine when a user has finished speaking and your bot should respond. Unlike basic Voice Activity Detection (VAD) which only detects speech vs. non-speech, Smart Turn Detection recognizes natural conversational cues like intonation patterns and linguistic signals for more natural conversations.

Key Benefits

  • Natural conversations: More human-like turn-taking patterns
  • Free to use: The model is fully open source
  • Scalable: Smart Turn v3 supports fast CPU inference directly inside your Pipecat Cloud instance

Quick Start

To enable Smart Turn Detection in your Pipecat Cloud bot, configure a TurnAnalyzerUserTurnStopStrategy with LocalSmartTurnAnalyzerV3 in your context aggregator. The model weights are bundled with Pipecat, so there’s no need to download them separately.
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.transports.daily.transport import DailyParams, DailyTransport
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies

async def main(room_url: str, token: str):
    transport = DailyTransport(
        room_url,
        token,
        "Voice AI Bot",
        DailyParams(
            audio_in_enabled=True,
            audio_out_enabled=True,
            # Set VAD to 0.2 seconds for optimal Smart Turn performance
            vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
        ),
    )

    # Configure Smart Turn Detection via user turn strategies
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
        context,
        user_params=LLMUserAggregatorParams(
            user_turn_strategies=UserTurnStrategies(
                stop=[TurnAnalyzerUserTurnStopStrategy(
                    turn_analyzer=LocalSmartTurnAnalyzerV3()
                )]
            ),
        ),
    )

    # Continue with your pipeline setup...
Smart Turn Detection requires VAD to be enabled with stop_secs=0.2. This value mimics the training data and allows Smart Turn to dynamically adjust timing based on the model’s predictions.

How It Works

  1. Audio Analysis: The system continuously analyzes incoming audio for speech patterns
  2. VAD Processing: Voice Activity Detection segments audio into speech and silence
  3. Turn Classification: When VAD detects a pause, the ML model analyzes the speech segment for natural completion cues
  4. Smart Response: The model determines if the turn is complete or if the user is likely to continue speaking

Training Data Collection

The smart-turn model is trained on real conversational data collected through these applications. Help us improve the model by contributing your own data or classifying existing data:

More information

For more details on Smart Turn, see the following links: