Skip to content

Commit 58fb497

Browse files
committed
Update docs metadata
1 parent 72760e1 commit 58fb497

File tree

2 files changed

+376
-0
lines changed

2 files changed

+376
-0
lines changed
Lines changed: 338 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,338 @@
1+
---
2+
title:
3+
keywords: Azure, python, SDK, API, azure-ai-voicelive, ai
4+
ms.date: 10/02/2025
5+
ms.topic: reference
6+
ms.devlang: python
7+
ms.service: ai
8+
---
9+
Azure AI VoiceLive client library for Python
10+
============================================
11+
12+
This package provides a **real-time, speech-to-speech** client for Azure AI VoiceLive.
13+
It opens a WebSocket session to stream microphone audio to the service and receive
14+
typed server events (including audio) for responsive, interruptible conversations.
15+
16+
> **Status:** General Availability (GA). This is a stable release suitable for production use.
17+
18+
> **Important:** As of version 1.0.0, this SDK is **async-only**. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use `async`/`await` syntax.
19+
20+
---
21+
22+
Getting started
23+
---------------
24+
25+
### Prerequisites
26+
27+
- **Python 3.9+**
28+
- An **Azure subscription**
29+
- A **VoiceLive** resource and endpoint
30+
- A working **microphone** and **speakers/headphones** if you run the voice samples
31+
32+
### Install
33+
34+
Install the stable GA version:
35+
36+
```bash
37+
# Base install (core client only)
38+
python -m pip install azure-ai-voicelive
39+
40+
# For asynchronous streaming (uses aiohttp)
41+
python -m pip install "azure-ai-voicelive[aiohttp]"
42+
43+
# For voice samples (includes audio processing)
44+
python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv
45+
```
46+
47+
The SDK provides async-only WebSocket connections using `aiohttp` for optimal performance and reliability.
48+
49+
### Authenticate
50+
51+
You can authenticate with an **API key** or an **Azure Active Directory (AAD) token**.
52+
53+
#### API Key Authentication (Quick Start)
54+
55+
Set environment variables in a `.env` file or directly in your environment:
56+
57+
```bash
58+
# In your .env file or environment variables
59+
AZURE_VOICELIVE_API_KEY="your-api-key"
60+
AZURE_VOICELIVE_ENDPOINT="your-endpoint"
61+
```
62+
63+
Then, use the key in your code:
64+
65+
```python
66+
import asyncio
67+
from azure.core.credentials import AzureKeyCredential
68+
from azure.ai.voicelive import connect
69+
70+
async def main():
71+
async with connect(
72+
endpoint="your-endpoint",
73+
credential=AzureKeyCredential("your-api-key"),
74+
model="gpt-4o-realtime-preview"
75+
) as connection:
76+
# Your async code here
77+
pass
78+
79+
asyncio.run(main())
80+
```
81+
82+
#### AAD Token Authentication
83+
84+
For production applications, AAD authentication is recommended:
85+
86+
```python
87+
import asyncio
88+
from azure.identity.aio import DefaultAzureCredential
89+
from azure.ai.voicelive import connect
90+
91+
async def main():
92+
credential = DefaultAzureCredential()
93+
94+
async with connect(
95+
endpoint="your-endpoint",
96+
credential=credential,
97+
model="gpt-4o-realtime-preview"
98+
) as connection:
99+
# Your async code here
100+
pass
101+
102+
asyncio.run(main())
103+
```
104+
105+
---
106+
107+
Key concepts
108+
------------
109+
110+
- **VoiceLiveConnection** – Manages an active async WebSocket connection to the service
111+
- **Session Management** – Configure conversation parameters:
112+
- **SessionResource** – Update session parameters (voice, formats, VAD) with async methods
113+
- **RequestSession** – Strongly-typed session configuration
114+
- **ServerVad** – Configure voice activity detection
115+
- **AzureStandardVoice** – Configure voice settings
116+
- **Audio Handling**:
117+
- **InputAudioBufferResource** – Manage audio input to the service with async methods
118+
- **OutputAudioBufferResource** – Control audio output from the service with async methods
119+
- **Conversation Management**:
120+
- **ResponseResource** – Create or cancel model responses with async methods
121+
- **ConversationResource** – Manage conversation items with async methods
122+
- **Error Handling**:
123+
- **ConnectionError** – Base exception for WebSocket connection errors
124+
- **ConnectionClosed** – Raised when WebSocket connection is closed
125+
- **Strongly-Typed Events** – Process service events with type safety:
126+
- `SESSION_UPDATED`, `RESPONSE_AUDIO_DELTA`, `RESPONSE_DONE`
127+
- `INPUT_AUDIO_BUFFER_SPEECH_STARTED`, `INPUT_AUDIO_BUFFER_SPEECH_STOPPED`
128+
- `ERROR`, and more
129+
130+
---
131+
132+
Examples
133+
--------
134+
135+
### Basic Voice Assistant (Featured Sample)
136+
137+
The Basic Voice Assistant sample demonstrates full-featured voice interaction with:
138+
139+
- Real-time speech streaming
140+
- Server-side voice activity detection
141+
- Interruption handling
142+
- High-quality audio processing
143+
144+
```bash
145+
# Run the basic voice assistant sample
146+
# Requires [aiohttp] for async
147+
python samples/basic_voice_assistant_async.py
148+
149+
# With custom parameters
150+
python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant"
151+
```
152+
153+
### Minimal example
154+
155+
```python
156+
import asyncio
157+
from azure.core.credentials import AzureKeyCredential
158+
from azure.ai.voicelive.aio import connect
159+
from azure.ai.voicelive.models import (
160+
RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
161+
)
162+
163+
API_KEY = "your-api-key"
164+
ENDPOINT = "wss://your-endpoint.com/openai/realtime"
165+
MODEL = "gpt-4o-realtime-preview"
166+
167+
async def main():
168+
async with connect(
169+
endpoint=ENDPOINT,
170+
credential=AzureKeyCredential(API_KEY),
171+
model=MODEL,
172+
) as conn:
173+
session = RequestSession(
174+
modalities=[Modality.TEXT, Modality.AUDIO],
175+
instructions="You are a helpful assistant.",
176+
input_audio_format=InputAudioFormat.PCM16,
177+
output_audio_format=OutputAudioFormat.PCM16,
178+
turn_detection=ServerVad(
179+
threshold=0.5,
180+
prefix_padding_ms=300,
181+
silence_duration_ms=500
182+
),
183+
)
184+
await conn.session.update(session=session)
185+
186+
# Process events
187+
async for evt in conn:
188+
print(f"Event: {evt.type}")
189+
if evt.type == ServerEventType.RESPONSE_DONE:
190+
break
191+
192+
asyncio.run(main())
193+
```
194+
195+
Available Voice Options
196+
-----------------------
197+
198+
### Azure Neural Voices
199+
200+
```python
201+
# Use Azure Neural voices
202+
voice_config = AzureStandardVoice(
203+
name="en-US-AvaNeural", # Or another voice name
204+
type="azure-standard"
205+
)
206+
```
207+
208+
Popular voices include:
209+
210+
- `en-US-AvaNeural` - Female, natural and professional
211+
- `en-US-JennyNeural` - Female, conversational
212+
- `en-US-GuyNeural` - Male, professional
213+
214+
### OpenAI Voices
215+
216+
```python
217+
# Use OpenAI voices (as string)
218+
voice_config = "alloy" # Or another OpenAI voice
219+
```
220+
221+
Available OpenAI voices:
222+
223+
- `alloy` - Versatile, neutral
224+
- `echo` - Precise, clear
225+
- `fable` - Animated, expressive
226+
- `onyx` - Deep, authoritative
227+
- `nova` - Warm, conversational
228+
- `shimmer` - Optimistic, friendly
229+
230+
---
231+
232+
Handling Events
233+
---------------
234+
235+
```python
236+
async for event in connection:
237+
if event.type == ServerEventType.SESSION_UPDATED:
238+
print(f"Session ready: {event.session.id}")
239+
# Start audio capture
240+
241+
elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:
242+
print("User started speaking")
243+
# Stop playback and cancel any current response
244+
245+
elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:
246+
# Play the audio chunk
247+
audio_bytes = event.delta
248+
249+
elif event.type == ServerEventType.ERROR:
250+
print(f"Error: {event.error.message}")
251+
```
252+
253+
---
254+
255+
Troubleshooting
256+
---------------
257+
258+
### Connection Issues
259+
260+
- **WebSocket connection errors (1006/timeout):**
261+
Verify `AZURE_VOICELIVE_ENDPOINT`, network rules, and that your credential has access.
262+
263+
- **Missing WebSocket dependencies:**
264+
If you see import errors, make sure you have installed the package:
265+
pip install azure-ai-voicelive[aiohttp]
266+
267+
- **Auth failures:**
268+
For API key, double-check `AZURE_VOICELIVE_API_KEY`. For AAD, ensure the identity is authorized.
269+
270+
### Audio Device Issues
271+
272+
- **No microphone/speaker detected:**
273+
Check device connections and permissions. On headless CI environments, audio samples can't run.
274+
275+
- **Audio library installation problems:**
276+
On Linux/macOS you may need PortAudio:
277+
278+
```bash
279+
# Debian/Ubuntu
280+
sudo apt-get install -y portaudio19-dev libasound2-dev
281+
# macOS (Homebrew)
282+
brew install portaudio
283+
```
284+
285+
### Enable Verbose Logging
286+
287+
```python
288+
import logging
289+
logging.basicConfig(level=logging.DEBUG)
290+
```
291+
292+
---
293+
294+
Next steps
295+
----------
296+
297+
1. **Run the featured sample:**
298+
- Try `samples/basic_voice_assistant_async.py` for a complete voice assistant implementation
299+
300+
2. **Customize your implementation:**
301+
- Experiment with different voices and parameters
302+
- Add custom instructions for specialized assistants
303+
- Integrate with your own audio capture/playback systems
304+
305+
3. **Advanced scenarios:**
306+
- Add function calling support
307+
- Implement tool usage
308+
- Create multi-turn conversations with history
309+
310+
4. **Explore other samples:**
311+
- Check the `samples/` directory for specialized examples
312+
- See `samples/README.md` for a full list of samples
313+
314+
---
315+
316+
Contributing
317+
------------
318+
319+
This project follows the Azure SDK guidelines. If you'd like to contribute:
320+
321+
1. Fork the repo and create a feature branch
322+
2. Run linters and tests locally
323+
3. Submit a pull request with a clear description of the change
324+
325+
---
326+
327+
Release notes
328+
-------------
329+
330+
Changelogs are available in the package directory.
331+
332+
---
333+
334+
License
335+
-------
336+
337+
This project is released under the **MIT License**.
338+
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
{
2+
"Name": "azure-ai-voicelive",
3+
"Version": "1.0.0",
4+
"DevVersion": null,
5+
"DirectoryPath": "sdk/ai/azure-ai-voicelive",
6+
"ServiceDirectory": "ai",
7+
"ReadMePath": "sdk/ai/azure-ai-voicelive/README.md",
8+
"ChangeLogPath": "sdk/ai/azure-ai-voicelive/CHANGELOG.md",
9+
"Group": null,
10+
"SdkType": "client",
11+
"IsNewSdk": true,
12+
"ArtifactName": "azure-ai-voicelive",
13+
"ReleaseStatus": "2025-10-01",
14+
"IncludedForValidation": false,
15+
"AdditionalValidationPackages": [
16+
""
17+
],
18+
"ArtifactDetails": {
19+
"safeName": "azureaivoicelive",
20+
"name": "azure-ai-voicelive",
21+
"triggeringPaths": [
22+
"/sdk/ai/ci.yml"
23+
]
24+
},
25+
"CIParameters": {
26+
"CIMatrixConfigs": [
27+
{
28+
"Selection": "sparse",
29+
"Name": "communication_ci_matrix",
30+
"GenerateVMJobs": true,
31+
"Path": "sdk/ai/platform-matrix.json"
32+
}
33+
]
34+
},
35+
"Namespaces": [
36+
"azure.ai.voicelive"
37+
]
38+
}

0 commit comments

Comments
 (0)