OpenAI Responses API WebSocket Is Production-Ready; Pydantic AI, LangChain, and CrewAI Lack Adapters

OpenAI’s Responses API has supported persistent WebSocket connections since February 2026, and the openai-python SDK finished production-hardening that transport in mid-April. For teams running tool-heavy agent loops, switching to WebSocket removes per-turn HTTP handshake overhead — but only if your framework has wired up the adapter. As of late April 2026, Pydantic AI is one merged PR away from shipping it, LangChain has an orphaned issue, and CrewAI hasn’t started.

What OpenAI Actually Shipped: Responses API WebSocket Endpoint and SDK Timeline

The WebSocket endpoint sits at wss://api.openai.com/v1/responses, with clients sending response.create events and receiving stream events over a persistent socket. (https://developers.openai.com/api/docs/guides/websocket-mode/) The openai-python SDK introduced WebSocket support for the Responses API in v2.22.0 on February 23, 2026. (https://github.com/openai/openai-python/releases/tag/v2.22.0) Two rounds of hardening followed: v2.31.0 (April 8) added raw-data sending, (https://github.com/openai/openai-python/releases/tag/v2.31.0) and v2.32.0 (April 15) brought event handlers, reconnection logic, and event enqueuing. (https://github.com/openai/openai-python/releases/tag/v2.32.0)

The architectural shift from HTTP is worth being precise about. Rather than opening a new connection per response turn, a WebSocket connection persists across the agent loop, amortizing TLS handshake and connection-setup cost over many tool-call cycles. For single-turn completions the difference is negligible. For workflows with many sequential tool calls, the per-turn HTTP overhead compounds in a way that WebSocket eliminates.

The Latency Economics: Per-Turn Overhead and the 40% Claim

OpenAI’s documentation claims up to roughly 40% faster end-to-end execution for agentic rollouts involving 20 or more tool calls when using WebSocket transport instead of HTTP. (https://developers.openai.com/api/docs/guides/websocket-mode/) That number comes from OpenAI’s own documentation, not from independent benchmarks.

The mechanism behind the claim is straightforward: each HTTP request to the Responses API carries TLS handshake and connection-setup overhead. A ten-tool-call agent loop might open ten separate connections over HTTP; over WebSocket, it opens one. At scale, those saved round-trips accumulate. But the 40% figure doesn’t specify what it’s measuring — wall time, median latency, p95 — nor does it characterize the baseline workload beyond “20+ tool calls.” That represents heavier agentic runs, not typical chat completions.

Connection Constraints That Complicate the Narrative

WebSocket transport is not a drop-in upgrade. The connection model imposes three constraints that change how agent runs need to be architected.

Sequential only. The connection supports one in-flight response at a time with no multiplexing. (https://developers.openai.com/api/docs/guides/websocket-mode/) If your agent loop fans out tool calls in parallel, you need one WebSocket connection per concurrent stream. That’s a meaningful shift from HTTP, where connection pooling handles concurrency implicitly. The per-session connection cost is no longer amortized across concurrent runs.

60-minute session limit. Connections expire after 60 minutes. (https://developers.openai.com/api/docs/guides/websocket-mode/) Long-running agentic workflows that cross that boundary need explicit reconnection logic. The SDK’s v2.32.0 release added reconnection support, (https://github.com/openai/openai-python/releases/tag/v2.32.0) but frameworks that haven’t adopted the adapter inherit none of it.

Connection-local cache. The WebSocket transport uses a connection-local in-memory cache rather than OpenAI’s server-side storage, making it compatible with Zero Data Retention configurations and store=false workflows. (https://developers.openai.com/api/docs/guides/websocket-mode/) For privacy-sensitive deployments, that’s a concrete compliance advantage. The tradeoff is that state doesn’t persist across connections — if the socket drops and reconnects, the prior context is gone.

Framework Integration Status: Pydantic AI, LangChain, and CrewAI

Framework	Status	Detail
Pydantic AI	PR open, under review	PR #4843 (Mar 25, 2026) — adds WebSocket mode to `OpenAIResponsesModel` via `connect()` context manager (https://github.com/pydantic/pydantic-ai/pull/4843)
LangChain	Issue open, no maintainer engagement	Issue #35415 (Feb 24, 2026) unassigned; PR #35578 (Mar 5) self-closed by author before maintainer review (https://github.com/langchain-ai/langchain/issues/35415, https://github.com/langchain-ai/langchain/pull/35578)
CrewAI	No activity	No open issues or PRs as of April 2026 (https://github.com/crewAIInc/crewAI/issues?q=websocket)

Pydantic AI is closest to shipping. PR #4843 adds WebSocket mode as an opt-in through an explicit connect() context manager, leaving the default HTTP behavior unchanged. (https://github.com/pydantic/pydantic-ai/pull/4843) The PR is under maintainer review as of late April 2026 but hasn’t merged.

LangChain’s trajectory is more ambiguous. Issue #35415 was opened the day after the SDK’s initial WebSocket release and has sat unassigned since February. (https://github.com/langchain-ai/langchain/issues/35415) An attempted PR (#35578) was self-closed by its author on March 5 before any maintainer reviewed it. (https://github.com/langchain-ai/langchain/pull/35578) That signals friction in the contributor pipeline but doesn’t confirm a technical blocker — a new PR could arrive independently of that history. LangChain’s architecture, which supports dozens of model providers through a shared abstraction layer, likely makes a transport-level change harder to scope cleanly than it is for Pydantic AI’s more focused OpenAI Responses integration.

CrewAI has no recorded discussion of WebSocket support for the Responses API. (https://github.com/crewAIInc/crewAI/issues?q=websocket) Operating at a higher abstraction layer — crew orchestration over individual model calls — it depends on whatever transport the underlying model provider client exposes. That dependency hasn’t translated into an integration task.

Who Gets the Win Now vs. Who Waits

Teams using the openai-python SDK directly — without a framework layer — can switch to WebSocket transport today using v2.32.0’s connect() context manager. For tool-heavy agent loops with 20 or more sequential steps, the handshake overhead savings are real, even if the 40% vendor figure is unverified.

Framework users don’t have that option without bypassing their framework’s HTTP stack. The practical consequence is a latency gap on tool-heavy runs between native SDK users and framework users — one proportional to the number of tool-call round-trips. Three or four tool calls are unlikely to produce measurable wall-time differences. Longer chains accumulate the delta.

Migration Gotchas and Operational Edge Cases

The sequential-only constraint has a non-obvious implication for connection-pooling economics. HTTP-based frameworks typically pool connections across concurrent runs, amortizing infrastructure overhead. WebSocket connections are sequential and connection-local, so N parallel agent sessions require N simultaneous WebSocket connections. Whether that increases cost depends on your deployment’s connection limits and session management logic.

The 60-minute ceiling matters most for autonomous agents with long planning cycles or workflows that include human-in-the-loop pauses. An agent that waits for user input mid-run and might sit idle past an hour needs explicit reconnection handling — not just at the SDK level, but within whatever session state the agent loop is maintaining above the transport.

ZDR compatibility is the one operational area where WebSocket transport is unambiguously preferable to HTTP-with-server-storage. If your deployment runs under Zero Data Retention requirements, the connection-local cache sidesteps the server-side storage model entirely. (https://developers.openai.com/api/docs/guides/websocket-mode/) That’s a compliance advantage independent of the latency argument.

The deeper structural consequence, if frameworks close the integration gap, is a shift in which component dominates latency budgets. Model inference is already the dominant term for most agent workflows. On tool-heavy runs where the model is fast and the tool chain is long, eliminating per-step handshake overhead moves the bottleneck from transport to think-time — which is a different problem with different optimization levers.

Frequently Asked Questions

Does WebSocket transport help all OpenAI API users, or only those running tool-heavy agent loops?

Chat completions and short agent runs won’t notice the difference. The savings compound across sequential tool-call turns, where each step skips a TLS handshake. OpenAI’s 40% estimate applies specifically to heavy agentic rollouts with 20 or more tool calls and has not been independently verified.

What do Pydantic AI users need to do to start using WebSocket transport on the Responses API?

As of late April 2026, they need to wait for PR #4843 to merge. The PR adds WebSocket mode as an opt-in via an explicit connect() context manager, leaving HTTP as the default, so existing agent code would not need to change unless you explicitly opt in.

Can a single WebSocket connection be used for parallel agent runs?

No. One socket, one stream. Parallel tool calls or concurrent sessions each need their own connection, which changes the resource math from HTTP’s implicit connection pooling.

What happens to agent state if a WebSocket connection drops or hits the 60-minute session limit?

State evaporates with the socket. Because the cache is connection-local, a timeout or disconnect forces a cold restart of the agent loop. SDK-level reconnection (v2.32.0) restores the pipe, not the memory.

When will LangChain and CrewAI ship WebSocket adapter support for the Responses API?

There is no confirmed timeline for either. LangChain has an unassigned issue open since February 2026 and a community PR that was self-closed before any maintainer review. CrewAI has no recorded issues or pull requests on the topic as of April 2026.

OpenAI Responses API WebSocket Is Production-Ready; Pydantic AI, LangChain, and CrewAI Lack Adapters

What OpenAI Actually Shipped: Responses API WebSocket Endpoint and SDK Timeline

The Latency Economics: Per-Turn Overhead and the 40% Claim

Connection Constraints That Complicate the Narrative

Framework Integration Status: Pydantic AI, LangChain, and CrewAI

Who Gets the Win Now vs. Who Waits

Migration Gotchas and Operational Edge Cases

Frequently Asked Questions

Does WebSocket transport help all OpenAI API users, or only those running tool-heavy agent loops?

What do Pydantic AI users need to do to start using WebSocket transport on the Responses API?

Can a single WebSocket connection be used for parallel agent runs?

What happens to agent state if a WebSocket connection drops or hits the 60-minute session limit?

When will LangChain and CrewAI ship WebSocket adapter support for the Responses API?

Sources

Enjoyed this article?

What OpenAI Actually Shipped: Responses API WebSocket Endpoint and SDK Timeline

The Latency Economics: Per-Turn Overhead and the 40% Claim

Connection Constraints That Complicate the Narrative

Framework Integration Status: Pydantic AI, LangChain, and CrewAI

Who Gets the Win Now vs. Who Waits

Migration Gotchas and Operational Edge Cases

Frequently Asked Questions

Does WebSocket transport help all OpenAI API users, or only those running tool-heavy agent loops?

What do Pydantic AI users need to do to start using WebSocket transport on the Responses API?

Can a single WebSocket connection be used for parallel agent runs?

What happens to agent state if a WebSocket connection drops or hits the 60-minute session limit?

When will LangChain and CrewAI ship WebSocket adapter support for the Responses API?

Sources

Related Articles

Council Mode Cuts Multi-Agent LLM Hallucination 35.9% at 4.2x Token Cost on HaluEval

Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling Breaks Open-Ended Idea Generation Even When Topologies Are Sparse

Google's TPU 8i Targets Agentic Workloads. What CrewAI, LangGraph, and AutoGen Must Measure

Enjoyed this article?