|
| 1 | +--- |
| 2 | +title: |
| 3 | +keywords: Azure, python, SDK, API, azure-ai-voicelive, ai |
| 4 | +ms.date: 10/02/2025 |
| 5 | +ms.topic: reference |
| 6 | +ms.devlang: python |
| 7 | +ms.service: ai |
| 8 | +--- |
| 9 | +Azure AI VoiceLive client library for Python |
| 10 | +============================================ |
| 11 | + |
| 12 | +This package provides a **real-time, speech-to-speech** client for Azure AI VoiceLive. |
| 13 | +It opens a WebSocket session to stream microphone audio to the service and receive |
| 14 | +typed server events (including audio) for responsive, interruptible conversations. |
| 15 | + |
| 16 | +> **Status:** General Availability (GA). This is a stable release suitable for production use. |
| 17 | +
|
| 18 | +> **Important:** As of version 1.0.0, this SDK is **async-only**. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use `async`/`await` syntax. |
| 19 | +
|
| 20 | +--- |
| 21 | + |
| 22 | +Getting started |
| 23 | +--------------- |
| 24 | + |
| 25 | +### Prerequisites |
| 26 | + |
| 27 | +- **Python 3.9+** |
| 28 | +- An **Azure subscription** |
| 29 | +- A **VoiceLive** resource and endpoint |
| 30 | +- A working **microphone** and **speakers/headphones** if you run the voice samples |
| 31 | + |
| 32 | +### Install |
| 33 | + |
| 34 | +Install the stable GA version: |
| 35 | + |
| 36 | +```bash |
| 37 | +# Base install (core client only) |
| 38 | +python -m pip install azure-ai-voicelive |
| 39 | + |
| 40 | +# For asynchronous streaming (uses aiohttp) |
| 41 | +python -m pip install "azure-ai-voicelive[aiohttp]" |
| 42 | + |
| 43 | +# For voice samples (includes audio processing) |
| 44 | +python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv |
| 45 | +``` |
| 46 | + |
| 47 | +The SDK provides async-only WebSocket connections using `aiohttp` for optimal performance and reliability. |
| 48 | + |
| 49 | +### Authenticate |
| 50 | + |
| 51 | +You can authenticate with an **API key** or an **Azure Active Directory (AAD) token**. |
| 52 | + |
| 53 | +#### API Key Authentication (Quick Start) |
| 54 | + |
| 55 | +Set environment variables in a `.env` file or directly in your environment: |
| 56 | + |
| 57 | +```bash |
| 58 | +# In your .env file or environment variables |
| 59 | +AZURE_VOICELIVE_API_KEY="your-api-key" |
| 60 | +AZURE_VOICELIVE_ENDPOINT="your-endpoint" |
| 61 | +``` |
| 62 | + |
| 63 | +Then, use the key in your code: |
| 64 | + |
| 65 | +```python |
| 66 | +import asyncio |
| 67 | +from azure.core.credentials import AzureKeyCredential |
| 68 | +from azure.ai.voicelive import connect |
| 69 | + |
| 70 | +async def main(): |
| 71 | + async with connect( |
| 72 | + endpoint="your-endpoint", |
| 73 | + credential=AzureKeyCredential("your-api-key"), |
| 74 | + model="gpt-4o-realtime-preview" |
| 75 | + ) as connection: |
| 76 | + # Your async code here |
| 77 | + pass |
| 78 | + |
| 79 | +asyncio.run(main()) |
| 80 | +``` |
| 81 | + |
| 82 | +#### AAD Token Authentication |
| 83 | + |
| 84 | +For production applications, AAD authentication is recommended: |
| 85 | + |
| 86 | +```python |
| 87 | +import asyncio |
| 88 | +from azure.identity.aio import DefaultAzureCredential |
| 89 | +from azure.ai.voicelive import connect |
| 90 | + |
| 91 | +async def main(): |
| 92 | + credential = DefaultAzureCredential() |
| 93 | + |
| 94 | + async with connect( |
| 95 | + endpoint="your-endpoint", |
| 96 | + credential=credential, |
| 97 | + model="gpt-4o-realtime-preview" |
| 98 | + ) as connection: |
| 99 | + # Your async code here |
| 100 | + pass |
| 101 | + |
| 102 | +asyncio.run(main()) |
| 103 | +``` |
| 104 | + |
| 105 | +--- |
| 106 | + |
| 107 | +Key concepts |
| 108 | +------------ |
| 109 | + |
| 110 | +- **VoiceLiveConnection** – Manages an active async WebSocket connection to the service |
| 111 | +- **Session Management** – Configure conversation parameters: |
| 112 | + - **SessionResource** – Update session parameters (voice, formats, VAD) with async methods |
| 113 | + - **RequestSession** – Strongly-typed session configuration |
| 114 | + - **ServerVad** – Configure voice activity detection |
| 115 | + - **AzureStandardVoice** – Configure voice settings |
| 116 | +- **Audio Handling**: |
| 117 | + - **InputAudioBufferResource** – Manage audio input to the service with async methods |
| 118 | + - **OutputAudioBufferResource** – Control audio output from the service with async methods |
| 119 | +- **Conversation Management**: |
| 120 | + - **ResponseResource** – Create or cancel model responses with async methods |
| 121 | + - **ConversationResource** – Manage conversation items with async methods |
| 122 | +- **Error Handling**: |
| 123 | + - **ConnectionError** – Base exception for WebSocket connection errors |
| 124 | + - **ConnectionClosed** – Raised when WebSocket connection is closed |
| 125 | +- **Strongly-Typed Events** – Process service events with type safety: |
| 126 | + - `SESSION_UPDATED`, `RESPONSE_AUDIO_DELTA`, `RESPONSE_DONE` |
| 127 | + - `INPUT_AUDIO_BUFFER_SPEECH_STARTED`, `INPUT_AUDIO_BUFFER_SPEECH_STOPPED` |
| 128 | + - `ERROR`, and more |
| 129 | + |
| 130 | +--- |
| 131 | + |
| 132 | +Examples |
| 133 | +-------- |
| 134 | + |
| 135 | +### Basic Voice Assistant (Featured Sample) |
| 136 | + |
| 137 | +The Basic Voice Assistant sample demonstrates full-featured voice interaction with: |
| 138 | + |
| 139 | +- Real-time speech streaming |
| 140 | +- Server-side voice activity detection |
| 141 | +- Interruption handling |
| 142 | +- High-quality audio processing |
| 143 | + |
| 144 | +```bash |
| 145 | +# Run the basic voice assistant sample |
| 146 | +# Requires [aiohttp] for async |
| 147 | +python samples/basic_voice_assistant_async.py |
| 148 | + |
| 149 | +# With custom parameters |
| 150 | +python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant" |
| 151 | +``` |
| 152 | + |
| 153 | +### Minimal example |
| 154 | + |
| 155 | +```python |
| 156 | +import asyncio |
| 157 | +from azure.core.credentials import AzureKeyCredential |
| 158 | +from azure.ai.voicelive.aio import connect |
| 159 | +from azure.ai.voicelive.models import ( |
| 160 | + RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType |
| 161 | +) |
| 162 | + |
| 163 | +API_KEY = "your-api-key" |
| 164 | +ENDPOINT = "wss://your-endpoint.com/openai/realtime" |
| 165 | +MODEL = "gpt-4o-realtime-preview" |
| 166 | + |
| 167 | +async def main(): |
| 168 | + async with connect( |
| 169 | + endpoint=ENDPOINT, |
| 170 | + credential=AzureKeyCredential(API_KEY), |
| 171 | + model=MODEL, |
| 172 | + ) as conn: |
| 173 | + session = RequestSession( |
| 174 | + modalities=[Modality.TEXT, Modality.AUDIO], |
| 175 | + instructions="You are a helpful assistant.", |
| 176 | + input_audio_format=InputAudioFormat.PCM16, |
| 177 | + output_audio_format=OutputAudioFormat.PCM16, |
| 178 | + turn_detection=ServerVad( |
| 179 | + threshold=0.5, |
| 180 | + prefix_padding_ms=300, |
| 181 | + silence_duration_ms=500 |
| 182 | + ), |
| 183 | + ) |
| 184 | + await conn.session.update(session=session) |
| 185 | + |
| 186 | + # Process events |
| 187 | + async for evt in conn: |
| 188 | + print(f"Event: {evt.type}") |
| 189 | + if evt.type == ServerEventType.RESPONSE_DONE: |
| 190 | + break |
| 191 | + |
| 192 | +asyncio.run(main()) |
| 193 | +``` |
| 194 | + |
| 195 | +Available Voice Options |
| 196 | +----------------------- |
| 197 | + |
| 198 | +### Azure Neural Voices |
| 199 | + |
| 200 | +```python |
| 201 | +# Use Azure Neural voices |
| 202 | +voice_config = AzureStandardVoice( |
| 203 | + name="en-US-AvaNeural", # Or another voice name |
| 204 | + type="azure-standard" |
| 205 | +) |
| 206 | +``` |
| 207 | + |
| 208 | +Popular voices include: |
| 209 | + |
| 210 | +- `en-US-AvaNeural` - Female, natural and professional |
| 211 | +- `en-US-JennyNeural` - Female, conversational |
| 212 | +- `en-US-GuyNeural` - Male, professional |
| 213 | + |
| 214 | +### OpenAI Voices |
| 215 | + |
| 216 | +```python |
| 217 | +# Use OpenAI voices (as string) |
| 218 | +voice_config = "alloy" # Or another OpenAI voice |
| 219 | +``` |
| 220 | + |
| 221 | +Available OpenAI voices: |
| 222 | + |
| 223 | +- `alloy` - Versatile, neutral |
| 224 | +- `echo` - Precise, clear |
| 225 | +- `fable` - Animated, expressive |
| 226 | +- `onyx` - Deep, authoritative |
| 227 | +- `nova` - Warm, conversational |
| 228 | +- `shimmer` - Optimistic, friendly |
| 229 | + |
| 230 | +--- |
| 231 | + |
| 232 | +Handling Events |
| 233 | +--------------- |
| 234 | + |
| 235 | +```python |
| 236 | +async for event in connection: |
| 237 | + if event.type == ServerEventType.SESSION_UPDATED: |
| 238 | + print(f"Session ready: {event.session.id}") |
| 239 | + # Start audio capture |
| 240 | + |
| 241 | + elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED: |
| 242 | + print("User started speaking") |
| 243 | + # Stop playback and cancel any current response |
| 244 | + |
| 245 | + elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA: |
| 246 | + # Play the audio chunk |
| 247 | + audio_bytes = event.delta |
| 248 | + |
| 249 | + elif event.type == ServerEventType.ERROR: |
| 250 | + print(f"Error: {event.error.message}") |
| 251 | +``` |
| 252 | + |
| 253 | +--- |
| 254 | + |
| 255 | +Troubleshooting |
| 256 | +--------------- |
| 257 | + |
| 258 | +### Connection Issues |
| 259 | + |
| 260 | +- **WebSocket connection errors (1006/timeout):** |
| 261 | + Verify `AZURE_VOICELIVE_ENDPOINT`, network rules, and that your credential has access. |
| 262 | + |
| 263 | +- **Missing WebSocket dependencies:** |
| 264 | + If you see import errors, make sure you have installed the package: |
| 265 | + pip install azure-ai-voicelive[aiohttp] |
| 266 | + |
| 267 | +- **Auth failures:** |
| 268 | + For API key, double-check `AZURE_VOICELIVE_API_KEY`. For AAD, ensure the identity is authorized. |
| 269 | + |
| 270 | +### Audio Device Issues |
| 271 | + |
| 272 | +- **No microphone/speaker detected:** |
| 273 | + Check device connections and permissions. On headless CI environments, audio samples can't run. |
| 274 | + |
| 275 | +- **Audio library installation problems:** |
| 276 | + On Linux/macOS you may need PortAudio: |
| 277 | + |
| 278 | + ```bash |
| 279 | + # Debian/Ubuntu |
| 280 | + sudo apt-get install -y portaudio19-dev libasound2-dev |
| 281 | + # macOS (Homebrew) |
| 282 | + brew install portaudio |
| 283 | + ``` |
| 284 | + |
| 285 | +### Enable Verbose Logging |
| 286 | + |
| 287 | +```python |
| 288 | +import logging |
| 289 | +logging.basicConfig(level=logging.DEBUG) |
| 290 | +``` |
| 291 | + |
| 292 | +--- |
| 293 | + |
| 294 | +Next steps |
| 295 | +---------- |
| 296 | + |
| 297 | +1. **Run the featured sample:** |
| 298 | + - Try `samples/basic_voice_assistant_async.py` for a complete voice assistant implementation |
| 299 | + |
| 300 | +2. **Customize your implementation:** |
| 301 | + - Experiment with different voices and parameters |
| 302 | + - Add custom instructions for specialized assistants |
| 303 | + - Integrate with your own audio capture/playback systems |
| 304 | + |
| 305 | +3. **Advanced scenarios:** |
| 306 | + - Add function calling support |
| 307 | + - Implement tool usage |
| 308 | + - Create multi-turn conversations with history |
| 309 | + |
| 310 | +4. **Explore other samples:** |
| 311 | + - Check the `samples/` directory for specialized examples |
| 312 | + - See `samples/README.md` for a full list of samples |
| 313 | + |
| 314 | +--- |
| 315 | + |
| 316 | +Contributing |
| 317 | +------------ |
| 318 | + |
| 319 | +This project follows the Azure SDK guidelines. If you'd like to contribute: |
| 320 | + |
| 321 | +1. Fork the repo and create a feature branch |
| 322 | +2. Run linters and tests locally |
| 323 | +3. Submit a pull request with a clear description of the change |
| 324 | + |
| 325 | +--- |
| 326 | + |
| 327 | +Release notes |
| 328 | +------------- |
| 329 | + |
| 330 | +Changelogs are available in the package directory. |
| 331 | + |
| 332 | +--- |
| 333 | + |
| 334 | +License |
| 335 | +------- |
| 336 | + |
| 337 | +This project is released under the **MIT License**. |
| 338 | + |
0 commit comments