Audio LLMs Break When the Codec Changes: A Robustness Vector Voice-AI Teams Haven't Tested

Audio LLMs that accept user-supplied recordings have been relying on a false assumption: that lossy codec compression in the input pipeline scrubs adversarial perturbations before they reach the model. A preprint posted to arXiv:2605.20519 on May 19 demonstrates that the opposite is true. An attacker who optimizes inside the codec’s own latent space can craft perturbations that don’t just survive compression but get preserved by it.

The codec was supposed to be a defense

Prior work on adversarial audio operated in the waveform domain: add a crafted perturbation to the raw signal, hope it survives whatever encoding the target pipeline applies. Lossy codecs like Opus, MP3, and AAC were considered a natural sanitizer. They discard perceptually irrelevant frequency components, and in doing so, they tended to destroy the high-frequency artifacts that waveform-domain attacks relied on.

This assumption was reasonable. Most waveform-domain adversarial examples are fragile under signal transformation. A codec pass was a cheap, passive defense requiring no changes to the model itself.

CodecAttack exploits the fact that lossy codecs don’t discard uniformly. They allocate bits where human hearing is most sensitive: roughly below 4 kHz. A perturbation that concentrates its energy in that band doesn’t get removed by the codec. It gets preserved, because the codec is designed to preserve it.

Latent-space optimization

The core mechanism is straightforward in concept. Rather than perturbing the waveform directly, the attack optimizes inside the continuous latent representation used by a neural audio codec. The codec’s encoder maps raw audio into a lower-dimensional latent space; its decoder reconstructs audio from that space. By injecting perturbations at the latent level, the attack ensures the resulting waveform is already in a form the codec will faithfully reproduce.

Per-band energy analysis in the paper confirms the mechanism. CodecAttack’s perturbations concentrate below 4 kHz, matching the spectral region where codecs allocate the most bits. Baseline waveform-domain perturbations spread energy into higher frequencies that codecs discard during compression.

The numbers

Across three Audio LLM deployment scenarios and three target models, CodecAttack achieves an 85.5% average target-substring attack success rate (ASR) on Opus at moderate bitrates, according to the paper. A waveform-domain baseline with identical EoT hardening never exceeds 26% ASR on the same targets.

The transfer results are the more concerning finding. CodecAttack trained on one codec transfers zero-shot to codecs it was never trained on: up to 100% ASR on MP3 and 84% on AAC-LC. An adversary does not need to know which codec the target pipeline uses.

Metric	CodecAttack (latent)	Waveform baseline
Avg ASR on Opus (moderate bitrate)	85.5%	≤26%
Zero-shot transfer to MP3	up to 100%	—
Zero-shot transfer to AAC-LC	84%	—

What this means for production voice-AI stacks

Systems like OpenAI’s Whisper, which processes audio through a 30-second sliding window using a Transformer seq2seq architecture, are representative of the deployment surface CodecAttack targets. Any voice-AI product that ingests user-supplied recordings (meeting transcription, call-center automation, voice assistants) passes audio through encoding and decoding stages before it reaches the model. Those stages were assumed to be a net defensive benefit. This paper shows they can be a net liability.

Real-world exploitability is constrained by the fact that the attack still requires crafting a specific perturbation per desired target output. This is not a spray-and-pray exploit. But for interactive voice systems where the attacker can submit audio and observe or infer the model’s output, the attack surface is live.

What teams should do now

The paper doesn’t ship a fix, but the defensive implications are clear.

Codec normalization at the inference boundary. Rather than accepting audio in whatever format the client sends, normalize all input through a single known codec and bitrate before it reaches the model. This doesn’t eliminate the attack (CodecAttack transfers across codecs), but it reduces the surface to one known channel rather than N unknown ones.

Adversarial training across codec variants. Fine-tuning on clean audio through a fixed codec is insufficient. Training pipelines need examples re-encoded through multiple codec-bitrate combinations, so the model learns invariance to codec-induced distribution shift.

Input provenance checks. If your pipeline accepts audio from untrusted sources, treat codec metadata as untrusted. A file that claims to be 16 kHz PCM but carries compression artifacts consistent with a prior MP3 pass is suspicious. Forensic detection tools for this exist in the audio forensics literature. They haven’t been adopted in the audio-LLM stack.

None of these are turnkey. The honest read is that the field does not yet have a robust defense against latent-space audio adversarial attacks, and the previous assumption of safety-through-compression was wrong.

Frequently Asked Questions

Does CodecAttack work in real-time or only against pre-recorded audio?

The multi-bitrate EoT optimization requires multiple compression passes per perturbation, making real-time crafting during a live call impractical with current hardware. Voicemail dropboxes, uploaded meeting recordings, and other store-and-forward ingestion points are the more realistic attack surface, since the adversary has unlimited time to optimize before submission.

Have other modalities dealt with compression-aware adversarial attacks?

Yes. JPEG-aware adversarial attacks on image classifiers were demonstrated as early as 2017, and the same dynamic (compression coefficients becoming a reliable channel for perturbations) appeared there first. Audio adversarial research is roughly five years behind vision on this vector, largely because codec preprocessing was assumed sufficient without empirical verification.

Are lossless codecs like FLAC immune?

CodecAttack exploits the bit-allocation behavior of lossy codecs, so uncompressed PCM and lossless formats have no exploitable latent codec representation. However, any pipeline that accepts lossless input and internally transcodes to a lossy format for bandwidth or processing reasons reintroduces the vulnerability at that transcoding boundary. Many production voice-AI systems transcode to Opus or AAC before inference regardless of the input format.

What does cross-codec adversarial training cost in practice?

Each codec-bitrate combination is a separate augmentation channel. A pipeline handling Opus at three bitrates, MP3 at three, and AAC at two would need eight encoding passes per training sample. For a dataset of hundreds of thousands of clips, cross-codec adversarial training could increase compute costs by an order of magnitude compared to single-codec fine-tuning, before accounting for the iterative adversarial example generation itself.