Azure · xitzhang · May 28, 2026 · May 22, 2026 · May 23, 2026 · May 23, 2026
@@ -1,5 +1,40 @@
 # Release History
 
+## 1.3.0b1 (Unreleased)
+
+### Features Added
+
+- **Azure Realtime Native Voice Support**: Added `AzureRealtimeNativeVoice` and
+  `AzureRealtimeNativeVoiceName`, and expanded `voice` fields to accept Azure realtime native voices.
+- **WebRTC Call Negotiation Support**: Added `ClientEventRtcCallSdpCreate`, `ServerEventRtcCallSdpCreated`,
+  `ServerEventRtcCallError`, and `RtcCallErrorDetails` for SDP-based WebRTC call setup.
+- **Input Text Streaming Support**: Added `ClientEventInputTextDelta` and `ClientEventInputTextDone`
+  for incrementally streaming text input into existing conversation items.
+- **Hosted Agent Invocation Input**: Added `invoke_input` to `ResponseCreateParams` and
+  `ServerEventResponseInvocationDelta` for hosted agent invocation passthrough data.
+- **Audio Playback Lifecycle Events**: Added `ServerEventOutputAudioBufferStarted` and
+  `ServerEventOutputAudioBufferStopped` to track model audio playback start and stop.
+- **Echo Cancellation Configuration**: Added `EchoCancellationReferenceSource` and new
+  `reference_source` / `channels` options on `AudioEchoCancellation` to support both the default
+  server loopback reference path and client-provided stereo echo reference input.
+- **Smart End-of-Turn Detection**: Added `SmartEndOfTurnDetection` as an audio-based end-of-turn
+  detection option.
+- **Parallel Tool Call Control**: Added `parallel_tool_calls` to session models so callers can
+  control whether tool calls may run in parallel.
+
+### Breaking Changes
+
+- **Image Input Field Rename**: Renamed `RequestImageContentPart.url` to `image_url`. Update
+  image input construction to use `image_url=` instead of `url=`.
+- **Default API Version Update**: Changed the SDK default API version from `2026-04-10` to
+  `2026-06-01-preview`. Pass `api_version="2026-04-10"` explicitly to keep the previous default
+  behavior.
+
+### Bug Fixes
+
+- **Deserialization Improvements**: Improved XML model deserialization and common scalar header
+  deserialization paths for better compatibility and lower overhead.
+
 ## 1.2.0 (2026-05-22)
 
 ### Features Added

@@ -5,7 +5,7 @@ This package provides a **real-time, speech-to-speech** client for Azure AI Voic
 It opens a WebSocket session to stream microphone audio to the service and receive
 typed server events (including audio) for responsive, interruptible conversations.
 
-> **Status:** General Availability (GA). This is a stable release suitable for production use.
+> **Status:** Preview (`1.3.0b1`). This beta release includes the latest SDK and sample updates and may change before the next stable release.
 
 > **Important:** As of version 1.0.0, this SDK is **async-only**. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use `async`/`await` syntax.
 
@@ -16,34 +16,35 @@ Getting started
 
 ### Prerequisites
 
-- **Python 3.9+**
+- **Python 3.10+**
 - An **Azure subscription**
 - A **VoiceLive** resource and endpoint
 - A working **microphone** and **speakers/headphones** if you run the voice samples
 
 ### Install
 
-Install the stable GA version:
+Install the latest preview version:
 
 ```bash
 # Base install (core client only)
-python -m pip install azure-ai-voicelive
+python -m pip install --pre azure-ai-voicelive
 
 # For asynchronous streaming (uses aiohttp)
-python -m pip install "azure-ai-voicelive[aiohttp]"
+python -m pip install --pre "azure-ai-voicelive[aiohttp]"
 
 # For voice samples (includes audio processing)
 # First install PyAudio dependencies for your platform:
 #   Linux: sudo apt-get install -y portaudio19-dev libasound2-dev
 #   macOS: brew install portaudio
-python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv
+python -m pip install --pre "azure-ai-voicelive[aiohttp]" azure-identity pyaudio python-dotenv
 ```
 
 The SDK provides async-only WebSocket connections using `aiohttp` for optimal performance and reliability.
 
 ### Authenticate
 
-You can authenticate with an **API key** or an **Azure Active Directory (AAD) token**.
+You can authenticate with an **API key** or a Microsoft Entra ID token.
+The samples default to `DefaultAzureCredential`; for local development, `az login` is usually the simplest path.
 
 #### API Key Authentication (Quick Start)
 
@@ -66,7 +67,7 @@ async def main():
     async with connect(
         endpoint="your-endpoint",
         credential=AzureKeyCredential("your-api-key"),
-        model="gpt-4o-realtime-preview"
+        model="gpt-realtime"
     ) as connection:
         # Your async code here
         pass
@@ -76,7 +77,7 @@ asyncio.run(main())
 
 #### AAD Token Authentication
 
-For production applications, AAD authentication is recommended:
+For production applications, Entra ID authentication is recommended:
 
 ```python
 import asyncio
@@ -85,14 +86,17 @@ from azure.ai.voicelive import connect
 
 async def main():
     credential = DefaultAzureCredential()
-
-    async with connect(
-        endpoint="your-endpoint",
-        credential=credential,
-        model="gpt-4o-realtime-preview"
-    ) as connection:
-        # Your async code here
-        pass
+
+    try:
+        async with connect(
+            endpoint="your-endpoint",
+            credential=credential,
+            model="gpt-realtime"
+        ) as connection:
+            # Your async code here
+            pass
+    finally:
+        await credential.close()
 
 asyncio.run(main())
 ```
@@ -107,13 +111,16 @@ Key concepts
   - **SessionResource** – Update session parameters (voice, formats, VAD) with async methods
   - **RequestSession** – Strongly-typed session configuration
   - **ServerVad** – Configure voice activity detection
+  - **SmartEndOfTurnDetection** – Configure audio-based end-of-turn detection
   - **AzureStandardVoice** – Configure voice settings
+  - **parallel_tool_calls** – Control whether tool calls may run in parallel for a session
 - **Audio Handling**:
   - **InputAudioBufferResource** – Manage audio input to the service with async methods
   - **OutputAudioBufferResource** – Control audio output from the service with async methods
 - **Conversation Management**:
   - **ResponseResource** – Create or cancel model responses with async methods
   - **ConversationResource** – Manage conversation items with async methods
+  - **ClientEventInputTextDelta / ClientEventInputTextDone** – Stream text input incrementally into an item
 - **Error Handling**: 
   - **ConnectionError** – Base exception for WebSocket connection errors
   - **ConnectionClosed** – Raised when WebSocket connection is closed
@@ -142,7 +149,7 @@ The Basic Voice Assistant sample demonstrates full-featured voice interaction wi
 python samples/basic_voice_assistant_async.py
 
 # With custom parameters
-python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant"
+python samples/basic_voice_assistant_async.py --model gpt-realtime --voice alloy --instructions "You're a helpful assistant"
 ```
 
 ### Minimal example
@@ -152,12 +159,18 @@ import asyncio
 from azure.core.credentials import AzureKeyCredential
 from azure.ai.voicelive.aio import connect
 from azure.ai.voicelive.models import (
-    RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
+    AudioEchoCancellation,
+    RequestSession,
+    Modality,
+    InputAudioFormat,
+    OutputAudioFormat,
+    ServerVad,
+    ServerEventType,
 )
 
 API_KEY = "your-api-key"
 ENDPOINT = "wss://your-endpoint.com/openai/realtime"
-MODEL = "gpt-4o-realtime-preview"
+MODEL = "gpt-realtime"
 
 async def main():
     async with connect(
@@ -170,6 +183,7 @@ async def main():
             instructions="You are a helpful assistant.",
             input_audio_format=InputAudioFormat.PCM16,
             output_audio_format=OutputAudioFormat.PCM16,
+            input_audio_echo_cancellation=AudioEchoCancellation(),
             turn_detection=ServerVad(
                 threshold=0.5, 
                 prefix_padding_ms=300, 
@@ -187,6 +201,13 @@ async def main():
 asyncio.run(main())
 ```
 
+`AudioEchoCancellation` now supports both the default server loopback reference path and a
+client-provided stereo echo reference. Use `reference_source="client"` with `channels=2` only when
+your application sends stereo PCM16 input with the microphone on channel 0 and the echo reference
+signal on channel 1.
+
+For image inputs, `RequestImageContentPart` uses the `image_url` field name.
+
 Available Voice Options
 -----------------------
 

@@ -1,6 +1,6 @@
 {
-  "apiVersion": "2026-04-10",
+  "apiVersion": "2026-06-01-preview",
   "apiVersions": {
-    "VoiceLive": "2026-04-10"
+    "VoiceLive": "2026-06-01-preview"
   }
 }
@@ -18,6 +18,7 @@
         "azure.ai.voicelive.models.AzureAvatarVoiceSyncVoice": "VoiceLive.AzureAvatarVoiceSyncVoice",
         "azure.ai.voicelive.models.AzureCustomVoice": "VoiceLive.AzureCustomVoice",
         "azure.ai.voicelive.models.AzurePersonalVoice": "VoiceLive.AzurePersonalVoice",
+        "azure.ai.voicelive.models.AzureRealtimeNativeVoice": "VoiceLive.AzureRealtimeNativeVoice",
         "azure.ai.voicelive.models.EouDetection": "VoiceLive.EouDetection",
         "azure.ai.voicelive.models.AzureSemanticDetection": "VoiceLive.AzureSemanticDetection",
         "azure.ai.voicelive.models.AzureSemanticDetectionEn": "VoiceLive.AzureSemanticDetectionEn",
@@ -45,6 +46,7 @@
         "azure.ai.voicelive.models.ClientEventOutputAudioBufferClear": "VoiceLive.ClientEventOutputAudioBufferClear",
         "azure.ai.voicelive.models.ClientEventResponseCancel": "VoiceLive.ClientEventResponseCancel",
         "azure.ai.voicelive.models.ClientEventResponseCreate": "VoiceLive.ClientEventResponseCreate",
+        "azure.ai.voicelive.models.ClientEventRtcCallSdpCreate": "VoiceLive.ClientEventRtcCallSdpCreate",
         "azure.ai.voicelive.models.ClientEventSessionAvatarConnect": "VoiceLive.ClientEventSessionAvatarConnect",
         "azure.ai.voicelive.models.ClientEventSessionUpdate": "VoiceLive.ClientEventSessionUpdate",
         "azure.ai.voicelive.models.ContentPart": "VoiceLive.ContentPart",
@@ -92,6 +94,7 @@
         "azure.ai.voicelive.models.ResponseSession": "VoiceLive.ResponseSession",
         "azure.ai.voicelive.models.ResponseTextContentPart": "VoiceLive.ResponseTextContentPart",
         "azure.ai.voicelive.models.ResponseWebSearchCallItem": "VoiceLive.ResponseWebSearchCallItem",
+        "azure.ai.voicelive.models.RtcCallErrorDetails": "VoiceLive.RtcCallErrorDetails",
         "azure.ai.voicelive.models.Scene": "VoiceLive.Scene",
         "azure.ai.voicelive.models.ServerEvent": "VoiceLive.ServerEvent",
         "azure.ai.voicelive.models.ServerEventConversationItemCreated": "VoiceLive.ServerEventConversationItemCreated",
@@ -111,6 +114,8 @@
         "azure.ai.voicelive.models.ServerEventMcpListToolsFailed": "VoiceLive.ServerEventMcpListToolsFailed",
         "azure.ai.voicelive.models.ServerEventMcpListToolsInProgress": "VoiceLive.ServerEventMcpListToolsInProgress",
         "azure.ai.voicelive.models.ServerEventOutputAudioBufferCleared": "VoiceLive.ServerEventOutputAudioBufferCleared",
+        "azure.ai.voicelive.models.ServerEventOutputAudioBufferStarted": "VoiceLive.ServerEventOutputAudioBufferStarted",
+        "azure.ai.voicelive.models.ServerEventOutputAudioBufferStopped": "VoiceLive.ServerEventOutputAudioBufferStopped",
         "azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDelta": "VoiceLive.ServerEventResponseAnimationBlendshapeDelta",
         "azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDone": "VoiceLive.ServerEventResponseAnimationBlendshapeDone",
         "azure.ai.voicelive.models.ServerEventResponseAnimationVisemeDelta": "VoiceLive.ServerEventResponseAnimationVisemeDelta",
@@ -131,6 +136,7 @@
         "azure.ai.voicelive.models.ServerEventResponseFileSearchCallSearching": "VoiceLive.ServerEventResponseFileSearchCallSearching",
         "azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDelta": "VoiceLive.ServerEventResponseFunctionCallArgumentsDelta",
         "azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDone": "VoiceLive.ServerEventResponseFunctionCallArgumentsDone",
+        "azure.ai.voicelive.models.ServerEventResponseInvocationDelta": "VoiceLive.ServerEventResponseInvocationDelta",
         "azure.ai.voicelive.models.ServerEventResponseMcpCallArgumentsDelta": "VoiceLive.ServerEventResponseMcpCallArgumentsDelta",
         "azure.ai.voicelive.models.ServerEventResponseMcpCallArgumentsDone": "VoiceLive.ServerEventResponseMcpCallArgumentsDone",
         "azure.ai.voicelive.models.ServerEventResponseMcpCallCompleted": "VoiceLive.ServerEventResponseMcpCallCompleted",
@@ -144,6 +150,8 @@
         "azure.ai.voicelive.models.ServerEventResponseWebSearchCallCompleted": "VoiceLive.ServerEventResponseWebSearchCallCompleted",
         "azure.ai.voicelive.models.ServerEventResponseWebSearchCallInProgress": "VoiceLive.ServerEventResponseWebSearchCallInProgress",
         "azure.ai.voicelive.models.ServerEventResponseWebSearchCallSearching": "VoiceLive.ServerEventResponseWebSearchCallSearching",
+        "azure.ai.voicelive.models.ServerEventRtcCallError": "VoiceLive.ServerEventRtcCallError",
+        "azure.ai.voicelive.models.ServerEventRtcCallSdpCreated": "VoiceLive.ServerEventRtcCallSdpCreated",
         "azure.ai.voicelive.models.ServerEventSessionAvatarConnecting": "VoiceLive.ServerEventSessionAvatarConnecting",
         "azure.ai.voicelive.models.ServerEventSessionAvatarSwitchToIdle": "VoiceLive.ServerEventSessionAvatarSwitchToIdle",
         "azure.ai.voicelive.models.ServerEventSessionAvatarSwitchToSpeaking": "VoiceLive.ServerEventSessionAvatarSwitchToSpeaking",
@@ -165,35 +173,37 @@
         "azure.ai.voicelive.models.VideoParams": "VoiceLive.VideoParams",
         "azure.ai.voicelive.models.VideoResolution": "VoiceLive.VideoResolution",
         "azure.ai.voicelive.models.VoiceLiveErrorDetails": "VoiceLive.VoiceLiveErrorDetails",
-        "azure.ai.voicelive.models.ClientEventType": "VoiceLive.ClientEventType",
-        "azure.ai.voicelive.models.ItemType": "VoiceLive.ItemType",
-        "azure.ai.voicelive.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
-        "azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
-        "azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
-        "azure.ai.voicelive.models.Modality": "VoiceLive.Modality",
+        "azure.ai.voicelive.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
         "azure.ai.voicelive.models.OpenAIVoiceName": "VoiceLive.OAIVoice",
         "azure.ai.voicelive.models.AzureVoiceType": "VoiceLive.AzureVoiceType",
         "azure.ai.voicelive.models.PersonalVoiceModels": "VoiceLive.PersonalVoiceModels",
-        "azure.ai.voicelive.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
+        "azure.ai.voicelive.models.AzureRealtimeNativeVoiceName": "VoiceLive.AzureRealtimeNativeVoiceName",
+        "azure.ai.voicelive.models.EouThresholdLevel": "VoiceLive.EouThresholdLevel",
+        "azure.ai.voicelive.models.TurnDetectionType": "VoiceLive.TurnDetectionType",
+        "azure.ai.voicelive.models.EchoCancellationReferenceSource": "VoiceLive.EchoCancellationReferenceSource",
+        "azure.ai.voicelive.models.AvatarConfigTypes": "VoiceLive.AvatarConfigTypes",
+        "azure.ai.voicelive.models.PhotoAvatarBaseModes": "VoiceLive.PhotoAvatarBaseModes",
+        "azure.ai.voicelive.models.AvatarOutputProtocol": "VoiceLive.AvatarOutputProtocol",
         "azure.ai.voicelive.models.ToolType": "VoiceLive.ToolType",
         "azure.ai.voicelive.models.MCPApprovalType": "VoiceLive.MCPApprovalType",
-        "azure.ai.voicelive.models.ReasoningEffort": "VoiceLive.ReasoningEffort",
         "azure.ai.voicelive.models.InterimResponseConfigType": "VoiceLive.InterimResponseConfigType",
         "azure.ai.voicelive.models.InterimResponseTrigger": "VoiceLive.InterimResponseTrigger",
-        "azure.ai.voicelive.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
+        "azure.ai.voicelive.models.Modality": "VoiceLive.Modality",
         "azure.ai.voicelive.models.InputAudioFormat": "VoiceLive.InputAudioFormat",
-        "azure.ai.voicelive.models.TurnDetectionType": "VoiceLive.TurnDetectionType",
-        "azure.ai.voicelive.models.EouThresholdLevel": "VoiceLive.EouThresholdLevel",
-        "azure.ai.voicelive.models.AvatarConfigTypes": "VoiceLive.AvatarConfigTypes",
-        "azure.ai.voicelive.models.PhotoAvatarBaseModes": "VoiceLive.PhotoAvatarBaseModes",
-        "azure.ai.voicelive.models.AvatarOutputProtocol": "VoiceLive.AvatarOutputProtocol",
+        "azure.ai.voicelive.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
         "azure.ai.voicelive.models.AudioTimestampType": "VoiceLive.AudioTimestampType",
         "azure.ai.voicelive.models.ToolChoiceLiteral": "VoiceLive.ToolChoiceLiteral",
+        "azure.ai.voicelive.models.ReasoningEffort": "VoiceLive.ReasoningEffort",
         "azure.ai.voicelive.models.SessionIncludeOption": "VoiceLive.SessionIncludeOption",
+        "azure.ai.voicelive.models.ClientEventType": "VoiceLive.ClientEventType",
+        "azure.ai.voicelive.models.ItemType": "VoiceLive.ItemType",
+        "azure.ai.voicelive.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
+        "azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
+        "azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
         "azure.ai.voicelive.models.ResponseStatus": "VoiceLive.ResponseStatus",
-        "azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
         "azure.ai.voicelive.models.RequestImageContentPartDetail": "VoiceLive.RequestImageContentPartDetail",
+        "azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
         "azure.ai.voicelive.models.ServerEventType": "VoiceLive.ServerEventType"
     },
-    "CrossLanguageVersion": "4f7c08a38aa5"
+    "CrossLanguageVersion": "d4391398f022"
 }
@@ -10,6 +10,8 @@
 
 if TYPE_CHECKING:
     from . import models as _models
-Voice = Union[str, "_models.OpenAIVoiceName", "_models.OpenAIVoice", "_models.AzureVoice"]
-InterimResponseConfig = Union["_models.StaticInterimResponseConfig", "_models.LlmInterimResponseConfig"]
+Voice = Union[
+    str, "_models.OpenAIVoiceName", "_models.OpenAIVoice", "_models.AzureVoice", "_models.AzureRealtimeNativeVoice"
+]
 ToolChoice = Union[str, "_models.ToolChoiceLiteral", "_models.ToolChoiceSelection"]
+InterimResponseConfig = Union["_models.StaticInterimResponseConfig", "_models.LlmInterimResponseConfig"]