GitHub's AI Capacity Crunch Pushes Microsoft to Rent AWS Compute

Microsoft owns GitHub, is OpenAI’s largest outside backer, and runs one of the world’s largest clouds in Azure. Yet in mid-June 2026 it is reportedly paying its biggest rival, Amazon, to keep GitHub’s AI features running after a surge of agent-generated code overwhelmed its own infrastructure. The signal for anyone shipping an AI product: a hyperscaler that owns its cloud still cannot self-supply inference at the pace its products now demand.

What Microsoft is renting AWS for, and what stays on Azure

GitHub’s AI inference and training workloads are the part moving to AWS, according to Business Insider’s reporting, which cited two people familiar with the plans. The core version control, repository storage, and authentication layers remain on Microsoft’s own infrastructure, per Excello Digital. That scoping matters. The spillover is concentrated in the latency-sensitive, GPU-bound inference path behind Copilot and agentic coding, not in the stateful core that developers historically associate with GitHub outages.

Microsoft itself confirmed only that GitHub is “exploring a multi-cloud strategy” and declined to comment on AWS specifically, blaming “the incredible spike in agentic development that began late last year” for straining its limits, Business Insider reported. The Amazon angle rests on unnamed sources, so it is reported, not confirmed by either company. Before the deal, an independent GitHub mostly ran its own data centers; Microsoft acquired the company in 2018 and had planned to move it entirely onto Azure by 2027.

How fast GitHub’s AI compute load grew

GitHub’s platform compute consumption has been overtaken by AI workloads. By early 2026, AI accounted for over 60 percent of GitHub’s total compute, up from roughly 15 percent in early 2023, according to Excello Digital. That share is driven by the compute-heavy, latency-sensitive inference behind Copilot requests.

The volume behind that share is where the strain shows up. GitHub COO Kyle Daigle said commits were on pace to hit 14 billion in 2026, up from 1 billion in 2025, per Business Insider. Agent activity has compounded it: AI agent-opened pull requests grew from 4 million in September 2025 to more than 17 million by March 2026, and GitHub Actions weekly compute minutes rose from 500 million in 2023 to 2.1 billion in a single week in early 2026, Excello Digital reported. The agent-PR and Actions-minute figures are single-source numbers, but the direction is unambiguous. A code-hosting platform’s load profile now bends toward inference, not storage and CI.

The consequence surfaced in availability. GitHub logged nine service incidents in May 2026, and platform availability dropped to roughly 88.4 percent in June 2026, well under the 99.9 percent that engineering teams assume from their version-control provider, per Excello Digital. Those figures originate with Excello Digital rather than Microsoft, so treat them as reported rather than official. The qualitative signal is corroborated, though. HashiCorp cofounder Mitchell Hashimoto wrote in April that GitHub was “no longer a place for serious work if it just blocks you out for hours per day, every day,” Business Insider reported.

Why Azure could not keep up

The bottleneck is physical, not strategic. Microsoft CTO Kevin Scott said in early October 2025 that “since ChatGPT and GPT-4 launched, it has been nearly impossible to build capacity fast enough,” and that even Microsoft’s most ambitious forecasts were frequently proven insufficient, per a report citing the remarks. That is the candid version of a problem Microsoft’s marketing rarely states plainly: inference capacity is constrained by GPU supply, power, cooling, and multi-year data-center lead times, not by a decision to underinvest.

Microsoft has been investing aggressively. It projected $190 billion in capital expenditures for calendar 2026, largely to expand data-center capacity, per Business Insider. Money did not fix the timeline. Internal Azure forecasts as of October 2025 showed several major U.S. regions, including Northern Virginia and parts of Texas, running out of physical space and servers, with shortages that were expected to ease potentially extending into mid-2026, DC Pulse reported. When you cannot pour concrete and rack GPUs fast enough in your own regions, renting a competitor’s already-provisioned cluster becomes the only lever that moves on an outage timescale.

Competitive pressure tightened the timeline further. In an internal Microsoft meeting late last year, an executive said GitHub needed an overhaul to compete with AI-native coding tools Cursor and Anthropic’s Claude Code, per audio reviewed by Business Insider. A capacity shortfall in the middle of that fight costs product velocity as much as it costs uptime. Renting AWS keeps Copilot answering while Azure catches up.

Why this is a pattern, not a one-off

Microsoft is not the only hyperscaler buying compute it does not own from a peer. Google disclosed a deal to pay SpaceX $920 million a month for AI compute capacity from October 2026 to June 2029, and Google Cloud separately agreed to sell AI compute capacity to Anthropic, Business Insider reported. The shapes differ: Google buys orbital capacity and sells to a model lab; Microsoft buys from a rival cloud to serve an internal product. The common thread is that single-vendor self-sufficiency is breaking under inference load, and the fix is cross-vendor compute movement.

These deals describe a market where AI compute is fungible enough that hyperscalers trade it bilaterally. Conventional cloud workloads rarely moved between hyperscalers in these volumes because they were shaped to a provider’s services. Inference is more portable: a GPU cluster running model weights is, at the margin, a GPU cluster, and it does not much matter whose logo is on the rack when the alternative is a Copilot outage.

What breaks about “we run on our own infrastructure”

The marketing claim that an AI product runs on a company’s own infrastructure has always been partly aspirational. As of mid-2026, for at least one of the largest AI-platform operators, it is no longer literally true. Microsoft owns GitHub, owns Azure, and is OpenAI’s largest backer, and it is still reportedly paying Amazon to keep Copilot’s inference path alive. If that company cannot self-supply inference at the pace its products demand, the assumption that a smaller operator can is worth re-examining.

This does not mean hyperscalers are abandoning owned infrastructure. The deeper cause is physical: GPU supply, power, cooling, and the multi-year lead time on data-center construction. Microsoft is still spending $190 billion in 2026 capex and still targets a full Azure migration for GitHub by 2027. The AWS arrangement is a patch over a timeline gap, not a retreat from owning the stack. What it punctures is the clean version of the story, the one where vertical integration of model, product, and cloud guarantees self-sufficiency. That guarantee now comes with a condition: valid only when your own regions have spare GPUs.

How to model competitor-cloud inference in a capacity plan

For an AI-platform operator, the practitioner takeaway is to budget competitor-cloud inference as a permanent cost line, not a bridge. When GitHub’s AI workloads hit 60 percent of platform compute and Azure could not provision fast enough, Microsoft paid its biggest rival to keep Copilot answering, per Business Insider. The transferable lessons are concrete.

Build a multi-cloud burst-capacity budget into the roadmap from the start. Treat a second provider’s inference as the pressure-relief valve your own capacity model assumes you will not need, until the week you do. That means contractually pre-warmed capacity or, at minimum, pre-integrated API paths, not a panic integration during an outage.

Update capacity models for the burst-heavy profile of agentic inference. AI inference is not the steady-state load that classic cloud capacity planning was built around. Agent PRs that grew from 4 million to 17 million in six months, and Actions minutes that quadrupled over three years, describe a load that spikes hard and latently, per Excello Digital. Provisioning to average demand leaves you short on every spike; provisioning to peak inside your own regions is exactly what produced the Azure shortfall Microsoft is working around.

Wire an outage-triggered spillover path to a second provider. The Microsoft arrangement covers inference and training specifically while core version control and auth stay on Microsoft infrastructure, per Excello Digital. That scoping is a useful template. Split the latency-sensitive, GPU-bound inference path from the stateful core, and make the inference path the one that can fail over to rented capacity without moving data you do not want on a competitor’s cloud.

Finally, price the cost of the alternative. GitHub’s June availability reportedly sat near 88.4 percent, and a HashiCorp cofounder publicly declared the platform unfit for serious work, per Business Insider. Renting a rival’s compute is embarrassing in a way that owning more of your own servers is not. Availability in the high-80s percent, plus a peer declaring the product unfit for serious work, is more embarrassing. The AWS bill is the cheaper line item.

Frequently Asked Questions

How far below the engineering norm is the reported 88.4 percent June availability?

An 88.4 percent month means roughly 83 hours of unavailability in June, against the 43 minutes that a 99.9 percent SLA permits. That puts GitHub more than 100 times over the reliability target engineering teams assume, which is why a capacity shortfall escalates from a planning issue to an outage emergency. The 88.4 percent figure comes from Excello Digital, not Microsoft, so treat the hour count as reported rather than official.

Does the AWS scope cover only Copilot inference, or does training data move too?

The reported agreement covers training as well as inference, which means the data behind Copilot model work can run on Amazon’s infrastructure. Amazon competes with Copilot through its own coding assistant, so placing training workloads on AWS creates a commercial tension the inference-only framing hides. Microsoft does not publicly state which training corpora cross the boundary.

If auth and repo storage stay on Azure while inference moves to AWS, what does that split cost in latency?

Each Copilot request that needs repository context now crosses a cloud boundary before the GPU-bound inference runs, adding network round-trips to a path GitHub’s own scoping calls latency-sensitive. The split is clean for failover, but it trades a latency penalty for the ability to burst without moving developer data. Neither company has disclosed which AWS regions sit behind the path, so the real round-trip cost is not public.

Is renting AWS compute cheaper for Microsoft than building more Azure?

No. Rented inference from a rival cloud carries market pricing plus inter-cloud egress fees, while owned GPUs amortize over several years, so the per-token cost runs higher on AWS. Microsoft pays the premium because capex cannot buy time: the $190 billion 2026 build adds capacity that lands too late to absorb the current spike, and renting is the only lever that moves on the timeline of an outage.

When does the AWS arrangement actually end, given the 2027 Azure migration target?

The exit is gated on Azure catching up, and the October 2025 internal forecasts showed Northern Virginia and parts of Texas short on space into mid-2026. If those regional shortages slip further, the AWS overlap could run past the 2027 migration target rather than end on schedule. Incentive to leave does not control a multi-year data-center construction timeline.