Table of Contents

Azure’s NAT Gateway classifies as a Hard NAT, preventing Tailscale from completing direct-connection hole-punching and pushing all traffic through DERP relay servers.1 The fix is a Peer Relay node in a public subnet with a public IP and UDP ingress explicitly permitted. v1.96.2 changed how that relay scales: UDP socket fan-out is now gated on container-aware GOMAXPROCS, which means AKS relay instances were silently capping at the host’s CPU view rather than the container’s allocated cores.2

Why Azure NAT Gateway Breaks Tailscale Direct Connections

Tailscale normally negotiates direct peer-to-peer connections using NAT traversal that works through most NAT implementations. Azure’s NAT Gateway is classified as a Hard NAT — it prevents the hole-punching step from succeeding, and all devices behind it fall back to routing encrypted traffic through DERP, Tailscale’s globally distributed relay network.1 The Azure reference architecture is direct about the consequence: this “can lead to lower throughput and performance than direct connections.”1

DERP is not a temporary workaround in the Tailscale architecture. It handles encrypted key exchange regardless of whether a Peer Relay or DERP carries the data path, and it remains the final fallback when no other relay is reachable.3 Peer Relays add a middle tier: a node inside your network boundary that relays traffic without the Hard NAT blocking the connection from forming.

Public-Subnet Peer Relay Deployment: Network, NSG, and UDP Requirements

To bypass the NAT Gateway, Tailscale’s Azure reference architecture specifies deploying a Peer Relay to “a public subnet in your virtual network, with a public IP address, and allow incoming UDP traffic to the relay port.”1 The relay cannot live behind the NAT Gateway it is meant to work around.

Within the Virtual Network, the relay needs a subnet with a route to the internet, a static or reserved public IP, and an NSG inbound rule permitting UDP on the relay port. Outbound from the relay to DERP must also be unblocked — DERP still handles key exchange even when the Peer Relay carries the data path.3

Container-Aware GOMAXPROCS and UDP Socket Scaling in v1.96.2

Go’s runtime defaults GOMAXPROCS to the number of host-visible CPUs. When a Peer Relay runs in a container, the host may expose 32 cores while the container’s cgroup limits it to 4. Before v1.96.2, the relay used the host-visible count to decide how many SO_REUSEPORT UDP sockets to open — a mismatch that left the relay over-provisioned on socket count relative to its actual goroutine threads.

v1.96.2 gated Peer Relay UDP socket scaling on container-aware GOMAXPROCS defaults.2 The relay now reads the correct core count from inside the container boundary. SO_REUSEPORT allows multiple sockets to bind the same UDP port and receive packets in parallel, improving throughput on multi-core systems4 — but only when socket count aligns with available parallelism. The fix closes that gap for AKS and other container runtimes.

Observability: Using tailscaled_peer_relay_endpoints Metrics

v1.96.2 added the tailscaled_peer_relay_endpoints gauge to Tailscale’s user metrics output.2 The gauge tracks how many endpoints are currently registered with each Peer Relay, giving operators a direct signal for whether clients are routing through the relay or falling through to DERP.

The metric’s most useful value is zero: a misconfigured relay — process-healthy but missing the UDP ingress rule, for example — will show normal system metrics while clients route elsewhere without surfacing any error. A persistent zero on tailscaled_peer_relay_endpoints after clients should have connected points specifically to a routing or ACL misconfiguration rather than an application failure. Scrape it alongside existing relay throughput counters and alert on a sustained zero from a relay that is supposed to be serving active clients.

Threat Model: ACL Grants, Public IPs, and What DERP Abstracted Away

Running a Peer Relay in a public subnet transfers an operational surface that DERP abstracted away: you now own the relay node’s public IP, its UDP exposure, and its uptime. The Tailscale ACL model contains the blast radius at the application layer. The tailscale.com/cap/relay capability, granted via a policy, determines which tailnet devices can allocate relay bindings on the node.3 A device without that grant cannot register with the relay.

Tailscale’s documentation explicitly warns against using * as the src field in the grant: doing so would make every tailnet device attempt to use the relay, “potentially leading to unintended traffic routing and high latency.”3

The relay port must accept connections from Tailscale clients behind NAT, so some public UDP exposure is inherent to the deployment model. The residual risk is not ACL bypass — relay binding is authenticated — but resource exhaustion through the open UDP listener. Rate-limiting inbound UDP at the NSG level is a reasonable precaution that a DERP operator handles on their infrastructure but that a self-hosted relay operator now owns.

Frequently Asked Questions

What’s the minimum Tailscale version for correct Peer Relay behavior on AKS?

v1.96.4 (Mar 27, 2026) is the floor. The tailscaled_peer_relay_endpoints metric gauge shipped in v1.96.2 (Mar 18), but the Tailscale changelog records the container-aware GOMAXPROCS socket scaling fix in v1.96.4. Running v1.96.2 gives observability without the scaling correction — relay sockets still over-provision relative to the container’s actual CPU limit.

Can Peer Relays replace DERP entirely in a tailnet?

No. The connection attempt order is: direct hole-punching → Peer Relay → DERP. DERP handles encrypted key exchange at every connection setup regardless of which tier carries the data. A tailnet with a Peer Relay but no DERP access cannot complete new connection handshakes — the relay carries traffic but DERP performs the cryptographic coordination that bootstraps every session.

Does SO_REUSEPORT improve relay throughput on a single-core container?

Minimal benefit. SO_REUSEPORT’s throughput gain comes from distributing incoming packets across multiple sockets scheduled on multiple cores. On a single-core container (GOMAXPROCS=1), the relay opens one socket, eliminating the parallelism advantage. Allocating at least 2 cores to the relay container is the practical minimum for measurable throughput improvement over DERP relay.

What’s the operational risk of using a wildcard src grant for the Peer Relay?

Devices that already have direct connections or are geographically closer to an optimal DERP server get forced through the relay, adding latency and making the relay node a single point of failure for traffic that previously had independent paths. The grant should target only the specific device group behind the Azure NAT Gateway that lacks connection alternatives — not the entire tailnet.

Footnotes

  1. Tailscale Docs: Azure Reference Architecture 2 3 4 5

  2. Tailscale Changelog 2 3

  3. Tailscale Docs: Peer Relays 2 3 4

  4. Tailscale Monthly Update: March 2026

Sources

  1. Tailscale Docs: Azure Reference Architecturevendoraccessed 2026-04-24
  2. Tailscale Docs: Peer Relaysvendoraccessed 2026-04-24
  3. Tailscale Changelogvendoraccessed 2026-04-24
  4. Tailscale Monthly Update: March 2026vendoraccessed 2026-04-24

Enjoyed this article?

Stay updated with our latest insights on AI and technology.