groundy
agents & frameworks

CrewAI vs AutoGen vs Microsoft Agent Framework: AutoGen's Merger Reframes the 2026 Choice

Microsoft merged AutoGen into Agent Framework, leaving CrewAI versus MAF as the 2026 choice. The orchestration primitive you pick becomes your trace and policy boundary.

8 min · · · 5 sources ↓

The three-way choice in the title is already a two-way decision in practice. Microsoft merged AutoGen into Microsoft Agent Framework (MAF) and retired the standalone branch, so for new projects the real 2026 call is CrewAI versus MAF, with LangGraph remaining the separate default for stateful graph workflows. What you are picking is not a feature checklist but an orchestration primitive, and that primitive becomes the boundary your observability traces and governance policy snap to.

What actually changed between AutoGen, MAF, and Semantic Kernel?

Microsoft Agent Framework reached 1.0 GA on April 2, 2026 as the single supported platform that converges AutoGen and Semantic Kernel into one SDK and runtime, with the same concepts and APIs across .NET and Python, per Microsoft’s BUILD 2026 announcement. MAF is positioned as the official successor to both predecessors.

Kanerika’s June 5 analysis reports that Microsoft placed AutoGen in maintenance mode in February 2026, limiting it to bug fixes and security patches, and merged AutoGen’s multi-agent patterns with Semantic Kernel’s enterprise capabilities into MAF. The maintenance-mode date rests on a single secondary source; treat it as reported rather than confirmed until it appears in Microsoft’s own repo notes.

The practical upshot: AutoGen is no longer the target for greenfield work. New multi-agent builds on Microsoft tooling land on MAF, and the CrewAI-versus-AutoGen framing that dominated 2024 and 2025 coverage is now stale.

How do CrewAI’s crews, MAF’s conversations, and LangGraph’s graphs differ?

The three frameworks organize work around different primitives: CrewAI’s role-based crew, MAF’s agent conversation and workflow modes, and LangGraph’s explicit state graph. Each primitive dictates how you reason about control flow and, as the next sections argue, about governance.

CrewAI runs a role-based crew model: a central orchestrator dispatches to fixed-scope agents, each with a defined role, goal, and toolset, a structure Kanerika characterizes as better suited to deterministic, auditable enterprise pipelines. CrewAI’s marketing claims adoption by “63% of the Fortune 500” with no disclosed methodology; treat it as vendor positioning. CrewAI’s platform adds a Control Plane with real-time tracing of every LLM, tool, and memory call, plus RBAC, immutable audit trails, human-in-the-loop approval gates, and runtime hooks for PII redaction and policy checks.

MAF splits orchestration into two modes, according to Microsoft’s framework introduction: Agent Orchestration (LLM-driven, creative reasoning) and Workflow Orchestration (deterministic, business-logic driven).

LangGraph sits in a different category. It functions as a separate production default for stateful, graph-based workflows, not a direct substitute for either CrewAI or MAF. It trades the opinionated agent abstraction for explicit nodes and edges, which costs convenience but buys precise control over branching and memory.

What did MAF ship at BUILD 2026?

At BUILD 2026, announced June 3, MAF added a first-class Agent Harness, Foundry Hosted Agents, and CodeAct, among other additions, per the devblog round-up.

The Agent Harness is the layer where model reasoning meets real execution. A single extension method turns any chat client into a harness with shell and filesystem access, human-in-the-loop approval flows, and automatic context compaction that trims chat history mid-loop to avoid context-window overflow. It ships providers for file memory, general file access, a todo ledger, plan-versus-execute operating modes, skill discovery, parallel background agents, hosted web search, and (.NET only) sandboxed shell execution. Two middleware pieces matter for governance: a ToolApprovalAgent for “don’t ask again” rules on sensitive calls, and an OpenTelemetryAgent that emits Semantic Conventions traces automatically.

Foundry Hosted Agents shipped with samples for RAG, skills, and memory scenarios in the python-1.5.0 release.

CodeAct is the most interesting of the bunch for latency-sensitive workloads. Instead of the choose-tool, wait, choose-next-tool loop, the model writes a single generated program that runs once and returns a consolidated result, collapsing several tool turns into one. MAF’s Monty-backed CodeAct provider shipped in the python-1.6.0 release.

Should teams already running AutoGen migrate to MAF?

Teams already running AutoGen should plan a migration path rather than freeze, because MAF is the designated home for AutoGen’s patterns, but the urgency depends on how much they lean on AutoGen-specific surface area.

Kanerika’s guidance is explicit: the legacy AutoGen branch should not be the target for new projects, and existing AutoGen deployments should plot a migration. MAF absorbs AutoGen’s multi-agent conversation patterns, so the conceptual move is smaller than a full rewrite, but the API surface is not identical and neither are the defaults.

Two cautions for anyone starting that move now. First, the surface is still churning. The GitHub release cadence shows python-1.6.0 on 2026-05-21, adding a Shell tool and a Monty-backed CodeAct provider, and the team kicked off a four-part “Build your own claw and agent harness” devblog series on June 22, 2026 with Parts 2 through 4 still marked coming soon. Pin your versions and expect breaking changes between minors. Second, the governance and durability features you may be migrating toward are themselves recent; OpenTelemetry instrumentation became default only at 1.6.0.

Why does the orchestration primitive matter more than the feature list?

The orchestration primitive is the real lock-in point, because every layer above it, your traces, approval gates, and policy checks, attaches to that primitive’s boundary, and swapping frameworks later means rewriting the governance layer, not just the agent code.

In CrewAI, the Control Plane traces every LLM, tool, and memory call within a crew, and RBAC, immutable audit trails, HITL gates, and PII-redaction hooks all snap to the crew and agent boundary. Your policy engine asks “did this agent, in this crew, follow its scope?”

In MAF, OpenTelemetry instrumentation is on by default from 1.6.0, and the enterprise layer adds long-running durability (pause, resume, recover), human-in-the-loop tool approvals, and native MCP, A2A, and OpenAPI interoperability. Traces and approvals snap to the agent turn and the workflow step. Your policy engine asks “did this turn, in this workflow, stay inside its guardrails?”

In a graph framework, traces and policy snap to nodes and edges. Your policy engine asks “is this state transition allowed?”

These are not interchangeable questions. A redaction hook written against CrewAI’s crew boundary does not port to MAF’s turn boundary without rework, and a policy rule expressed as a graph-edge constraint in LangGraph has no clean equivalent in either. The framework choice predetermines the shape of your compliance artifacts, your dashboards, and the place where a regulator or an internal audit team will ask for evidence. That is the second-order bet the title glosses over: you are choosing a control surface as much as a build tool.

Which framework should you pick in 2026?

Pick by stack, governance need, and workload shape. A .NET team or a shop already deep in Azure and Foundry defaults to MAF, because the harness, hosted agents, and CodeAct provider compound on the same runtime. A Python team that wants auditable, role-based pipelines with a managed control plane and is comfortable with vendor positioning skews toward CrewAI. Workloads with complex branching state where you want explicit node-and-edge control, and are willing to give up the opinionated agent abstraction for it, belong in LangGraph.

FrameworkOrchestration primitiveTrace and policy boundaryBest fit
CrewAIRole-based crew (central orchestrator, fixed-scope agents)Crew and agent (every LLM/tool/memory call traced)Deterministic, auditable Python pipelines; managed governance wanted
Microsoft Agent FrameworkAgent conversation and workflow (Agent Orchestration + Workflow Orchestration)Agent turn and workflow step (OpenTelemetry default).NET or Azure/Foundry shops; mixed creative and deterministic work
LangGraphExplicit state graph (nodes and edges)Graph node and edge (state transitions)Stateful branching workflows needing precise control

For teams on AutoGen today, the decision is really about timing, not direction: MAF is where the patterns are going, so the question is whether to move now against a fast-moving 1.6.x surface or wait a release or two for the API to settle. For greenfield projects, the framework question and the governance question are the same question, and they are best answered together before the first agent is wired up.

Frequently Asked Questions

How much latency and token cost does CodeAct actually save on a real workload?

Microsoft’s representative order-totals benchmark runs CodeAct in 13.23 seconds on 2,489 tokens, against 27.81 seconds and 6,890 tokens for the traditional choose-tool loop, a 52.4 percent latency cut and 63.9 percent token cut, executing inside a Hyperlight micro-VM. Your mileage depends on how many tool turns a single generated program can fold together; the win shrinks for lookups that were always one call.

What does Foundry Hosted Agents add over running MAF in your own containers?

Foundry Hosted Agents brings scale-to-zero billing, filesystem persistence that survives agent restarts, and per-session VM isolation, all concerns a self-hosted container deployment must engineer for itself. That persistence matters for long-running agents that pause and resume: state lives in the host rather than being rehydrated from your own store on every cold start.

Is CrewAI genuinely cheaper to run than AutoGen-style conversational agents?

Kanerika claims CrewAI runs 30 to 60 percent more token-efficiently than AutoGen-style conversational agents on structured tasks, attributing it to the role-based crew dispatching fixed-scope agents instead of an open conversation. The figure comes with no disclosed methodology, so treat it as a directional vendor claim and re-measure on your own workload before repeating it internally.

Which orchestration patterns does MAF expose beyond sequential and handoff?

MAF exposes five patterns: sequential, concurrent, group chat, handoff, and magentic, the last being a manager agent that keeps a dynamic task ledger and reassigns work as the run progresses rather than following a fixed plan. Magentic arrived via Magentic protocol messages in the dotnet-1.6.1 release, so .NET teams get it before Python shops do.

How do you confirm AutoGen’s maintenance-mode status before committing to a migration?

The February 2026 maintenance-mode date and its implications come from a single consulting-firm source, so check the microsoft/autogen repository’s own notes and release activity before treating it as confirmed. The same secondary source internally contradicts itself on AutoGen’s reach, citing roughly 58,700 GitHub stars in prose and about 28,400 in its comparison table, which is a flag to corroborate any number you plan to cite.

sources · 5 cited

  1. CrewAI vs AutoGen vs Microsoft Agent Framework: 2026 Guide kanerika.com analysis accessed 2026-06-24
  2. CrewAI crewai.com vendor accessed 2026-06-24
  3. Releases · microsoft/agent-framework github.com primary accessed 2026-06-24