<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Groundy — Models &amp; Research</title><description>Where architecture, training tricks, and eval methodology meet the marketing layer — separating durable progress in foundation models from leaderboard theater that quietly falls apart under load.</description><link>https://groundy.com/</link><language>en-us</language><atom:link href="https://groundy.com/category/models-research/rss.xml" rel="self" type="application/rss+xml"/><item><title>Tracing Why LLM Agent Memory Fails: A Method for Attributing Errors</title><link>https://groundy.com/articles/tracing-why-llm-agent-memory-fails-a-method-for-attributing-errors/</link><guid isPermaLink="true">https://groundy.com/articles/tracing-why-llm-agent-memory-fails-a-method-for-attributing-errors/</guid><description>MemTrace constructs provenance graphs across every memory operation in an LLM agent, tracing wrong answers to the exact operation that corrupted state across sessions.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-29T00:00:00.000Z</atom:updated><category>llm-memory</category><category>debugging</category><category>rag</category><category>agent-frameworks</category><category>error-attribution</category><category>provenance</category><author>Groundy Editorial</author></item><item><title>Persona Prompts Change Who an LLM Recommends as an Expert</title><link>https://groundy.com/articles/persona-prompts-change-who-an-llm-recommends-as-an-expert/</link><guid isPermaLink="true">https://groundy.com/articles/persona-prompts-change-who-an-llm-recommends-as-an-expert/</guid><description>A 43-model audit finds that geographic and role framing in LLM prompts systematically shifts which scholars get recommended as experts, with no neutral default.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-29T00:00:00.000Z</atom:updated><category>llm-bias</category><category>persona-prompts</category><category>expert-recommendation</category><category>scholar-discovery</category><category>ai-fairness</category><category>recommendation-systems</category><author>Groundy Editorial</author></item><item><title>Opus 4.8 Batch API: 1M Context, 300k Output, and Team Cost Controls</title><link>https://groundy.com/articles/opus-4-8-batch-api-1m-context-300k-output-and-team-cost-controls/</link><guid isPermaLink="true">https://groundy.com/articles/opus-4-8-batch-api-1m-context-300k-output-and-team-cost-controls/</guid><description>Opus 4.8 has a 1M token context window (200k on Foundry), 128k standard output, and 300k output via Batch API beta. January 2026 cutoff. Batch design and quota allocation.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>claude-opus</category><category>batch-api</category><category>anthropic</category><category>rate-limits</category><category>context-window</category><category>model-release</category><category>team-infrastructure</category><author>Groundy Editorial</author></item><item><title>Opus 4.8 vs Opus 4.7: What Changed and What Did Not</title><link>https://groundy.com/articles/opus-4-8-vs-opus-4-7-what-changed-and-what-did-not/</link><guid isPermaLink="true">https://groundy.com/articles/opus-4-8-vs-opus-4-7-what-changed-and-what-did-not/</guid><description>Anthropic&apos;s Opus 4.8 raises SWE-Bench Pro from 64.3% to 69.2% and cuts code-flaw pass-through fourfold at unchanged $5/$25 pricing. A fast mode at $10/$50 runs 2.5x quicker.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>claude</category><category>anthropic</category><category>opus-48</category><category>model-release</category><category>benchmarks</category><category>agentic-coding</category><author>Groundy Editorial</author></item><item><title>One Learning Rate Doesn&apos;t Fit All: Heavy-Tail Layerwise LR Schedules for LLM Pretraining</title><link>https://groundy.com/articles/one-learning-rate-doesnt-fit-all-heavy-tail-layerwise-lr-schedules-for-llm/</link><guid isPermaLink="true">https://groundy.com/articles/one-learning-rate-doesnt-fit-all-heavy-tail-layerwise-lr-schedules-for-llm/</guid><description>LLR assigns per-layer learning rates from spectral heavy-tail diagnostics during LLM pretraining, achieving 1.5x faster convergence and up to 2 pp higher zero-shot accuracy.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-27T00:00:00.000Z</atom:updated><category>llm-pretraining</category><category>learning-rate</category><category>spectral-analysis</category><category>optimizer</category><category>transformer-training</category><category>icml-2026</category><author>Groundy Editorial</author></item><item><title>Scale Vectors: Tiny Parameter Subsets That Disproportionately Steer LLM Behavior</title><link>https://groundy.com/articles/scale-vectors-tiny-parameter-subsets-that-disproportionately-steer-llm-behavior/</link><guid isPermaLink="true">https://groundy.com/articles/scale-vectors-tiny-parameter-subsets-that-disproportionately-steer-llm-behavior/</guid><description>Scale vectors are a negligible parameter class in LLM normalization layers whose outsized optimization role makes them high-value targets for quantization and safety editing.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>scale-vectors</category><category>llm-quantization</category><category>mechanistic-interpretability</category><category>model-normalization</category><category>model-compression</category><category>llm-training</category><author>Groundy Editorial</author></item><item><title>Embedding Compression at Training Time: DIVE&apos;s Gradient Trick vs Post-Hoc Quantization for Vector DBs</title><link>https://groundy.com/articles/embedding-compression-at-training-time-dives-gradient-trick-vs-post-hoc/</link><guid isPermaLink="true">https://groundy.com/articles/embedding-compression-at-training-time-dives-gradient-trick-vs-post-hoc/</guid><description>DIVE&apos;s gradient-limited adapter outperforms baselines for embedding compression, but training-time methods lock RAG pipelines to specific adapters and raise refresh costs.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-26T00:00:00.000Z</atom:updated><category>embedding-compression</category><category>rag</category><category>vector-databases</category><category>dive</category><category>adapter-methods</category><author>Groundy Editorial</author></item><item><title>μP Hyperparameter Transfer Has an Embedding Layer Hole, New arXiv Paper Says</title><link>https://groundy.com/articles/p-hyperparameter-transfer-has-an-embedding-layer-hole-new-arxiv-paper-says/</link><guid isPermaLink="true">https://groundy.com/articles/p-hyperparameter-transfer-has-an-embedding-layer-hole-new-arxiv-paper-says/</guid><description>An arXiv paper shows the embedding learning rate accounts for most of μP&apos;s advantage over standard parameterization, and a single scaling fix recovers the bulk of the benefit.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-26T00:00:00.000Z</atom:updated><category>mup</category><category>hyperparameter-transfer</category><category>embedding-layer</category><category>adamw</category><category>model-scaling</category><category>training-optimization</category><author>Groundy Editorial</author></item><item><title>Audio LLMs Break When the Codec Changes: A Robustness Vector Voice-AI Teams Haven&apos;t Tested</title><link>https://groundy.com/articles/audio-llms-break-when-the-codec-changes-a-robustness-vector-voice-ai-teams/</link><guid isPermaLink="true">https://groundy.com/articles/audio-llms-break-when-the-codec-changes-a-robustness-vector-voice-ai-teams/</guid><description>CodecAttack achieves 85.5% attack success on audio LLMs by optimizing in codec latent space, with 100% zero-shot transfer to MP3, proving lossy compression fails as a defense.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-26T00:00:00.000Z</atom:updated><category>adversarial-audio</category><category>audio-llms</category><category>codec-robustness</category><category>voice-ai</category><category>adversarial-ml</category><category>audio-security</category><author>Groundy Editorial</author></item><item><title>Project Glasswing One Month In: AI Bug Discovery Has Outpaced the Patch Pipeline</title><link>https://groundy.com/articles/project-glasswing-one-month-in-ai-bug-discovery-has-outpaced-the-patch-pipeline/</link><guid isPermaLink="true">https://groundy.com/articles/project-glasswing-one-month-in-ai-bug-discovery-has-outpaced-the-patch-pipeline/</guid><description>Anthropic&apos;s Glasswing found over 10,000 high-severity vulnerabilities in one month. Only 97 are patched. The bottleneck shifted from discovery to triage, and it is structural.</description><pubDate>Sun, 24 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>ai-security</category><category>vulnerability-disclosure</category><category>anthropic</category><category>claude-mythos</category><category>cybersecurity</category><category>interpretability</category><author>Groundy Editorial</author></item><item><title>Do LLMs Know What Not to Say? Causal Evidence for Statistical Preemption</title><link>https://groundy.com/articles/do-llms-know-what-not-to-say-causal-evidence-for-statistical-preemption/</link><guid isPermaLink="true">https://groundy.com/articles/do-llms-know-what-not-to-say-causal-evidence-for-statistical-preemption/</guid><description>New causal evidence shows LLMs suppress wrong continuations during pretraining via statistical preemption, suggesting output-layer safety fixes may target the wrong layer.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-26T00:00:00.000Z</atom:updated><category>statistical-preemption</category><category>llm-safety</category><category>model-interpretability</category><category>hallucination</category><category>causal-probing</category><category>pretraining</category><author>Groundy Editorial</author></item><item><title>arXiv 2605.16428 Measures AI Search&apos;s Drag on Publisher Traffic Using Paired Google and Reddit Data</title><link>https://groundy.com/articles/arxiv-2605-16428-measures-ai-searchs-drag-on-publisher-traffic-using-paired/</link><guid isPermaLink="true">https://groundy.com/articles/arxiv-2605-16428-measures-ai-searchs-drag-on-publisher-traffic-using-paired/</guid><description>An arXiv study finds AI Overviews boost Reddit engagement 12% for experience-based content, but Google AI Mode erases those gains, reshaping search-driven publishing economics.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-23T00:00:00.000Z</atom:updated><category>ai-overviews</category><category>google-search</category><category>publisher-traffic</category><category>content-strategy</category><category>reddit</category><category>search-ecology</category><author>Groundy Editorial</author></item><item><title>A Theory of Time-Sensitive Language Generation Says Sparse Hallucination Beats Mode Collapse</title><link>https://groundy.com/articles/a-theory-of-time-sensitive-language-generation-says-sparse-hallucination-beats/</link><guid isPermaLink="true">https://groundy.com/articles/a-theory-of-time-sensitive-language-generation-says-sparse-hallucination-beats/</guid><description>arXiv 2605.11302 proves timely generation requires sparse hallucination under formal bounds, reframing RLHF safety tuning as a tradeoff between two failure modes.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-23T00:00:00.000Z</atom:updated><category>hallucination</category><category>rlhf</category><category>language-generation</category><category>safety-tuning</category><category>mode-collapse</category><category>formal-methods</category><category>deep-learning-theory</category><author>Groundy Editorial</author></item><item><title>The Last Word Often Wins: A Format Confound Inflates Chain-of-Thought Corruption Robustness Scores</title><link>https://groundy.com/articles/the-last-word-often-wins-a-format-confound-inflates-chain-of-thought-corruption/</link><guid isPermaLink="true">https://groundy.com/articles/the-last-word-often-wins-a-format-confound-inflates-chain-of-thought-corruption/</guid><description>A format confound in CoT corruption benchmarks, suffix sensitivity collapsed 19× when final-answer text was stripped, means published faithfulness scores are inflated.</description><pubDate>Tue, 19 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-19T00:00:00.000Z</atom:updated><category>chain-of-thought</category><category>eval-methodology</category><category>process-reward-models</category><category>gsm8k</category><category>reasoning-faithfulness</category><category>format-confound</category><category>benchmarking</category><author>Groundy Editorial</author></item><item><title>Learning, Fast and Slow: What arXiv 2605.12484 Proposes for LLMs That Adapt Continually</title><link>https://groundy.com/articles/learning-fast-and-slow-what-arxiv-2605-12484-proposes-for-llms-that-adapt/</link><guid isPermaLink="true">https://groundy.com/articles/learning-fast-and-slow-what-arxiv-2605-12484-proposes-for-llms-that-adapt/</guid><description>Fast-Slow Training splits LLM updates into prompt fast weights and parametric slow weights, cutting KL drift by 70% and lifting sample efficiency by 3×, keeping plasticity.</description><pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-18T00:00:00.000Z</atom:updated><category>continual-learning</category><category>fine-tuning</category><category>llm-training</category><category>prompt-optimization</category><category>reinforcement-learning</category><category>qwen</category><category>sample-efficiency</category><author>Groundy Editorial</author></item><item><title>There Will Be a Scientific Theory of Deep Learning: What arXiv 2604.21691 Argues and Where It Will Lose</title><link>https://groundy.com/articles/there-will-be-a-scientific-theory-of-deep-learning-what-arxiv-2604-21691-argues/</link><guid isPermaLink="true">https://groundy.com/articles/there-will-be-a-scientific-theory-of-deep-learning-what-arxiv-2604-21691-argues/</guid><description>Fourteen theorists argue fragmented deep-learning theory is converging into &apos;learning mechanics,&apos; but concede scaling exponents and nonlinear stability remain open.</description><pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-04-29T00:00:00.000Z</atom:updated><category>deep-learning</category><category>scaling-laws</category><category>training-dynamics</category><category>neural-tangent-kernel</category><category>edge-of-stability</category><category>generalization</category><author>Groundy Editorial</author></item><item><title>Qwen3.6-27B&apos;s Dense Architecture Challenges the MoE-Only Playbook for Flagship-Class Coding Models</title><link>https://groundy.com/articles/qwen36-27bs-dense-architecture-challenges-the-moe-only-playbook-for-flagship/</link><guid isPermaLink="true">https://groundy.com/articles/qwen36-27bs-dense-architecture-challenges-the-moe-only-playbook-for-flagship/</guid><description>Alibaba&apos;s dense Qwen3.6-27B outperforms its MoE sibling on coding benchmarks, trading predictable inference latency for a larger memory footprint than sparse alternatives.</description><pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-04-24T00:00:00.000Z</atom:updated><category>qwen</category><category>dense-models</category><category>moe</category><category>inference</category><category>coding-models</category><category>model-architecture</category><category>llm-deployment</category><author>Groundy Editorial</author></item><item><title>Chinese AI Models Compared: DeepSeek, Qwen, Kimi, Doubao, and Ernie</title><link>https://groundy.com/articles/the-chinese-ai-model-ecosystem-deepseek-qwen-kimi-doubao-and-ernie-compared/</link><guid isPermaLink="true">https://groundy.com/articles/the-chinese-ai-model-ecosystem-deepseek-qwen-kimi-doubao-and-ernie-compared/</guid><description>DeepSeek isn&apos;t China&apos;s only frontier AI. Compare DeepSeek, Qwen, Kimi, Doubao, and Ernie on benchmarks, licensing, API access, and use-case fit.</description><pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-29T00:00:00.000Z</atom:updated><category>deepseek</category><category>qwen</category><category>kimi</category><category>doubao</category><category>ernie</category><category>chinese-ai</category><category>alibaba</category><category>baidu</category><category>bytedance</category><author>Groundy Editorial</author></item><item><title>Running DeepSeek R1 Locally: Hardware Requirements, Quantization, and Real Throughput</title><link>https://groundy.com/articles/running-deepseek-r1-locally-hardware-requirements-quantization-and-real-throughput/</link><guid isPermaLink="true">https://groundy.com/articles/running-deepseek-r1-locally-hardware-requirements-quantization-and-real-throughput/</guid><description>What hardware actually runs DeepSeek R1 at useful speeds? Specific token/s benchmarks across GPU configs, quantization options, and the honest tradeoffs.</description><pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-29T00:00:00.000Z</atom:updated><category>deepseek</category><category>local-inference</category><category>quantization</category><category>hardware</category><category>ollama</category><author>Groundy Editorial</author></item><item><title>Fish-Speech: The Open-Source TTS Model That&apos;s Threatening ElevenLabs</title><link>https://groundy.com/articles/fish-speech-open-source-tts-model-that-s-threatening/</link><guid isPermaLink="true">https://groundy.com/articles/fish-speech-open-source-tts-model-that-s-threatening/</guid><description>Fish Audio&apos;s S2 model reached SOTA benchmarks in March 2026 with sub-100ms latency, 80+ languages, and open-sourced weights, directly challenging ElevenLabs&apos; commercial dominance while exposing the real costs of &apos;free&apos; voice AI.</description><pubDate>Sun, 15 Mar 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-22T00:00:00.000Z</atom:updated><category>ai-models</category><category>open-source</category><category>audio-ai</category><author>Groundy Editorial</author></item><item><title>Gemini 2.0 Pro&apos;s 2 Million Token Context: What Can You Actually Do With It?</title><link>https://groundy.com/articles/gemini-2-0-pro-s-2-million-token-context-what-can-you/</link><guid isPermaLink="true">https://groundy.com/articles/gemini-2-0-pro-s-2-million-token-context-what-can-you/</guid><description>Google&apos;s Gemini 2.0 Pro Experimental offers a 2 million token context window. Here is what practitioners have found works, what fails, and where the hard limits are.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>ai-models</category><category>google</category><category>gemini</category><category>nlp</category><category>context-window</category><category>anthropic</category><author>Groundy Editorial</author></item><item><title>Google&apos;s TimesFM: A Foundation Model for Time Series</title><link>https://groundy.com/articles/google-s-timesfm-foundation-model-time/</link><guid isPermaLink="true">https://groundy.com/articles/google-s-timesfm-foundation-model-time/</guid><description>TimesFM is Google&apos;s pretrained, decoder-only transformer model for zero-shot time-series forecasting, trained on ~100 billion real-world time-points to deliver accurate predictions across domains without retraining.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-14T00:00:00.000Z</atom:updated><category>machine-learning</category><category>forecasting</category><author>Groundy Editorial</author></item><item><title>Synthetic Data Is Eating AI Training</title><link>https://groundy.com/articles/synthetic-data-eating-ai/</link><guid isPermaLink="true">https://groundy.com/articles/synthetic-data-eating-ai/</guid><description>The internet&apos;s supply of high-quality human-generated text is approaching exhaustion. Synthetic data, AI-generated training corpora, is filling the gap, but introduces new failure modes practitioners must understand, including model collapse and quality drift.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><category>machine-learning</category><category>training-data</category><author>Groundy Editorial</author></item><item><title>Claude&apos;s Web Search Changes Everything for AI Research</title><link>https://groundy.com/articles/claude-s-web-search-changes-everything-ai/</link><guid isPermaLink="true">https://groundy.com/articles/claude-s-web-search-changes-everything-ai/</guid><description>Claude Opus 4.8 integrates web search inside the reasoning loop with mandatory citations, domain filtering, and dynamic HTML filtering that cuts token use by 24%.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>ai-models</category><category>anthropic</category><category>search</category><category>research</category><category>opus-4-8</category><author>Groundy Editorial</author></item><item><title>DeepSeek V3/R1: How Chinese Engineers Matched GPT-4 for $6 Million</title><link>https://groundy.com/articles/deepseek-v3-r1-how-chinese-engineers-matched-gpt-4-6/</link><guid isPermaLink="true">https://groundy.com/articles/deepseek-v3-r1-how-chinese-engineers-matched-gpt-4-6/</guid><description>DeepSeek&apos;s V3 and R1 models match GPT-4-class performance using a fraction of the compute through architectural innovations in Mixture of Experts, attention compression, and reinforcement learning, demonstrating that training efficiency may matter more than raw hardware scale.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-18T00:00:00.000Z</atom:updated><category>ai-models</category><category>deepseek</category><category>training</category><category>efficiency</category><category>china</category><author>Groundy Editorial</author></item><item><title>The Million-Token Context Window: What Can You Actually Do?</title><link>https://groundy.com/articles/million-token-context-window-what-can-you-actually/</link><guid isPermaLink="true">https://groundy.com/articles/million-token-context-window-what-can-you-actually/</guid><description>Million-token context windows let you feed entire codebases, legal contracts, and hours of video to an LLM in one pass, but advertised limits routinely overstate practical capability. Here&apos;s what the benchmarks, failure modes, and real deployment patterns actually show.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>llm</category><category>context-window</category><author>Groundy Editorial</author></item><item><title>Gemini 3.1 Pro: Google&apos;s New Reasoning Model Explained</title><link>https://groundy.com/articles/gemini-3-1-pro-google-s-new-reasoning-model/</link><guid isPermaLink="true">https://groundy.com/articles/gemini-3-1-pro-google-s-new-reasoning-model/</guid><description>Gemini 3.1 Pro achieves 77.1% on ARC-AGI-2. See how it stacks up against Anthropic&apos;s Opus 4.8 (SWE-Bench Pro 69.2%, Terminal-Bench 74.6%) and GPT-5.5.</description><pubDate>Thu, 19 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>ai-models</category><category>google</category><category>reasoning</category><category>benchmarks</category><category>anthropic</category><author>Groundy Editorial</author></item><item><title>AI Code Generation Benchmarks 2026: Which Model Actually Writes Better Code?</title><link>https://groundy.com/articles/ai-code-generation-benchmarks-2026-which-model-actually/</link><guid isPermaLink="true">https://groundy.com/articles/ai-code-generation-benchmarks-2026-which-model-actually/</guid><description>Claude Opus 4.8 leads SWE-Bench Pro at 69.2% as of May 2026, while GPT-5.5 leads Verified at 88.7%. Benchmark scores and real-world coding utility continue to diverge sharply.</description><pubDate>Sun, 15 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-29T00:00:00.000Z</atom:updated><category>ai-research</category><category>benchmarks</category><category>code-generation</category><category>comparison</category><category>claude</category><author>Groundy Editorial</author></item><item><title>Kimi Claw: Moonshot AI&apos;s Answer to Claude and ChatGPT</title><link>https://groundy.com/articles/kimi-claw-moonshot-ai-s-answer-claude/</link><guid isPermaLink="true">https://groundy.com/articles/kimi-claw-moonshot-ai-s-answer-claude/</guid><description>Moonshot AI&apos;s Kimi models offer trillion-parameter scale, open weights, and pricing 33x below Claude Opus 4.8, making it China&apos;s leading open-source challenger to Western AI.</description><pubDate>Wed, 18 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>ai-models</category><category>competition</category><category>china</category><category>chatbots</category><category>benchmarks</category><author>Groundy Editorial</author></item><item><title>WiFi DensePose: Full-Body Tracking Through Walls Using Your Router</title><link>https://groundy.com/articles/wifi-densepose-full-body-tracking-through-walls-using-your/</link><guid isPermaLink="true">https://groundy.com/articles/wifi-densepose-full-body-tracking-through-walls-using-your/</guid><description>WiFi routers can perform full-body pose estimation through walls using Channel State Information, turning everyday network infrastructure into a covert tracking system.</description><pubDate>Wed, 18 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-29T00:00:00.000Z</atom:updated><category>ai-research</category><category>privacy</category><category>surveillance</category><category>computer-vision</category><category>wifi</category><category>rf-sensing</category><author>Groundy Editorial</author></item><item><title>The Best AI Models for OpenClaw in 2026</title><link>https://groundy.com/articles/best-ai-models-openclaw-2026/</link><guid isPermaLink="true">https://groundy.com/articles/best-ai-models-openclaw-2026/</guid><description>Which LLM to pick for OpenClaw in 2026: Opus 4.8, Kimi K2.5, Gemini 3.1 Pro, GPT-5.4, and budget options ranked by use case and benchmark evidence.</description><pubDate>Wed, 11 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>openclaw</category><category>llm</category><category>ai-models</category><category>coding</category><category>claude</category><category>gpt</category><category>gemini</category><author>Groundy Editorial</author></item></channel></rss>