articles
all articles
feed
- jun 04 security Activation Steering Was Sold as LLM Control. New Work Makes It an Attack Surface
- jun 04 culture Can Teaching Logical Fallacies Inoculate People Against AI Misinformation?
- jun 04 devtools Vercel Ships Experimental Native CLI Binaries to Cut the Node Startup Tax
- jun 04 security Catching LLM Agents Leaking Credentials From Their Own Activations
- jun 04 policy Refusal Steering Targets Individual Experts in MoE LLMs
- jun 04 infra Putting a Datacenter V100 in a Gaming PC: The Local LLM Math
- jun 04 devtools Vercel Rebuilds Its Marketplace CLI for Agents Instead of Humans
- jun 04 security The 2026 npm Attacks Proved AI Coding Assistants Are a Supply-Chain Target
- jun 03 security ChatGPT's New Lockdown Mode Borrows Apple's Name for a Prompt-Injection Kill Switch
- jun 03 agents When MCP Tool Descriptions Don't Match the Code, Agents Trust the Lie
- jun 03 security Students Are Prompt-Injecting AI Graders to Score Full Marks
- jun 03 devtools Malicious npm Packages Hit Red Hat's Published JavaScript Clients
- jun 03 policy Stacked Org Policies in LLM Chatbots Break Where Rules Collide
- jun 03 security Removing an LLM Backdoor Post-Training Without the Poisoned Data
- jun 03 models Which Layer Detects LLM Hallucinations Best? The Case Against Fixed-Layer Probes
- jun 03 policy Why Fine-Tuning Strips Safety Alignment From Open-Weight LLMs
- jun 03 security Stored Prompt Injection Now Persists Across AI Agent Sessions
- jun 03 industry MiniMax M3 Bundles 1M Context and Native Multimodal Into One Open-Weight Model
- jun 03 security LLM Data Poisoning Survives the Data-Cleaning Defenses Built to Stop It
- jun 03 devtools OpenAI Upgrades Codex Right as Teams Weigh Leaving Claude Code
- jun 03 policy Game Theory vs RLHF: Modeling LLM Safety Alignment as a Non-Cooperative Game
- jun 03 infra Cost-Aware RAG Routing: When Deeper Retrieval Stops Paying Off
- jun 02 devtools GitHub Copilot Moves to a Platform App, Decoupling From the Editor
- jun 02 infra Using Your Nvidia GPU's VRAM as Linux Swap: Where the NBD Hack Breaks Down
- jun 02 security Why OpenAI Bets on Instruction Hierarchy to Stop Prompt Injection
- jun 02 policy Explainability Mandates Leak Graph Models to Their Attackers
- jun 02 security Stopping Multi-Turn LLM Jailbreaks Without Retraining the Model
- jun 02 security African Languages Are a Jailbreak Blind Spot for English-Tuned LLM Safety
- jun 02 devtools How a VSCode Bug Let One Click Steal Your GitHub Token
- jun 02 agents When an AI Agent Causes a Loss, Who Files the Insurance Claim?
- jun 02 models Cross-Domain RL Training Degrades Capabilities. CARE-RL Reweights to Fix It
- jun 02 agents When Agent Skill Libraries Scale, Dependency-Aware Retrieval Beats Flat Search
- jun 02 policy Evolutionary Search Finds LLM Jailbreak Classes That Static Red-Teaming Misses
- jun 02 security Poisoning Open-Source LLM Merges: One Bad Checkpoint Hijacks the Result
- jun 02 agents Can Instruction-Tuned Retrievers Fix Agentic Search's Retrieval Gap?
- jun 02 models LLM Watermarking Without Quality Loss: The Non-Distortionary Approach
- jun 02 security An Autonomous Research Agent Now Discovers SOTA LLM Jailbreak Attacks
- jun 02 devtools GitHub Copilot and Productivity: What an Observational Dose-Response Study Measures
- jun 02 policy Why AI Red-Teaming Rediscovers the Same Jailbreaks and Misses the Rest
- jun 02 industry Morningstar's $780B SpaceX Mark Undercuts the IPO Target by Half
- jun 02 security Malware Can Prompt-Inject the AI Agent Reverse-Engineering It
- jun 02 agents Bandit-Based Prompt Optimization Targets Multi-Agent Systems Like CrewAI and AutoGen
- jun 02 security CVE-Factory Turns Published CVEs Into Security Agent Training Data. A 32B Model Beats Claude 4.5 Sonnet.
- jun 01 oss Open-Source Workspace Suite tinycld Takes On Google and Nextcloud
- jun 01 oss DARPA's AIxCC Postmortem: What Autonomous Cyber Reasoning Systems Got Right and Wrong
- jun 01 oss An Open-Source Home Camera That Encrypts End-to-End Instead of Trusting Ring
- jun 01 policy LLMs Treat the Assistant Persona as Privileged. That's a Safety Gap
- jun 01 industry Vercel's Grep Buy Signals Code Search Is Now AI Agent Infrastructure
- jun 01 security LLM Reasoning Traces Leak the Private Data They're Told to Hide
- jun 01 models Treating LLM Agent Memory as a Database: The VikingMem Approach
- jun 01 oss Your Open-Source License Won't Stop Someone Phishing With Your Code
- jun 01 models Can a Language Model Work Without a Neural Network? A New arXiv Paper Says Yes
- jun 01 models Can Code-Generating LLMs Do Engineering Math? FEM-Bench Tests Them
- jun 01 policy Newer LLMs Aren't Always Safer: Adversarial Attacks Transfer Across Model Generations
- jun 01 models Unlearning Isn't Deletion: arXiv 2505.16831 Shows Machine Unlearning in LLMs Is Reversible
- jun 01 security Video Jailbreaks Hit Multimodal LLMs by Splitting Payloads Across Clips
- jun 01 industry OMB's Power to Cancel Any Grant at Any Time Shifts Risk Onto University AI Labs
- jun 01 devtools JetBrains Ships Codex Natively, Making Its IDE the Multi-Vendor AI Surface
- jun 01 industry Anthropic's $965B Private Mark Now Faces a Confidential S-1
- may 31 models Why LLMs Fail at Spatial Reasoning When Planning Navigation
- may 31 culture Ranking LLMs Side by Side Makes Their Dialect Bias Worse
- may 31 security Vercel AI SDK CVE-2025-48985: Input Validation Bypass Hits LLM App Builders
- may 31 policy Can Synthetic Preference Data Keep RLHF Private Without Wrecking Alignment?
- may 31 agents What Breaks When Claude Code Writes Production Code: A New Failure Catalog
- may 31 security Hijacking AI Agent Memory: One Conversation Can Plant a Persistent Trojan
- may 31 security Why Attack Success Rate Misleads LLM Jailbreak Benchmarks
- may 31 agents More Agents, Worse Results: Why Multi-Agent LLM Teams Hold Experts Back
- may 31 devtools Transformers.js v4 Moves Transformer Inference Into the Browser
- may 31 industry OpenRouter's $113M Series B Bets Routing Beats Picking a Single LLM
- may 31 models Does Giving AI Agents More Skills Help? A Controlled SkillsBench Study
- may 31 policy FTC's May 11 Take It Down Act Letters Set May 19 Deadline: 48-Hour Removal, $53,088 Per Violation
- may 30 culture Replacing Workers With AI Erodes the Skills You'll Need Later
- may 30 culture Does AI Have 6.5 Years Before It Breaches a Planetary Boundary?
- may 30 policy Can a Mental Health Support Chatbot Be Safe If It Learns From Forums?
- may 30 policy Dataset Watermarks Fail to Trace Fine-Tuned AI Image Models, New Benchmark Finds
- may 30 culture Can LLM Agents Realistically Fake Reactions to Online News?
- may 30 security Job Seekers Are Prompt-Injecting AI Resume Screeners. New Study Measures the Hit Rate
- may 30 security Why Audio Jailbreaks Slip Past the Safety Training Built for Text LLMs
- may 30 models Can an LLM Peer-Review Your Paper? A New Behavior Benchmark
- may 30 security LoRA Adapter Backdoors Generalize Beyond Their Trigger Tokens
- may 30 infra Cloudflare Turnstile Now Fingerprints WebGL: The Privacy CAPTCHA Tradeoff
- may 30 models Anthropic Scaled Sparse Autoencoders to Claude 3 Sonnet. Interpretability Now Costs Compute
- may 29 oss An Open-Source 80386 Rebuilt Around Intel's Original Microcode
- may 29 industry Valve's $200 Steam Deck Price Hike Concedes the Handheld PC Margin Squeeze
- may 28 policy Can LLM Personas Replace Human Survey Respondents? New arXiv Paper Tests Decision Alignment
- may 28 culture Wikipedia's Foundation Is Running Big Tech's Anti-Labor Playbook, an Editor Argues
- may 28 security Three Labs Concede Browser Agents Cannot Stop Prompt Injection
- may 28 agents Multi-Agent LLM Coordination: Why Attention Steering Beats Full Broadcast
- may 28 models Tracing Why LLM Agent Memory Fails: A Method for Attributing Errors
- may 28 security Vercel Firewall Now Blocks SAMLStorm. Can an Edge WAF Fix a SAML Signature Flaw?
- may 28 models Persona Prompts Change Who an LLM Recommends as an Expert
- may 28 policy Distributed Training Breaks the Compute Thresholds Behind AI Regulation
- may 28 agents DataClawBench: AI Agents Fail at Exploratory Financial Analysis Across 492 Tasks
- may 28 infra The Viral AWS Support Post Is a Warning About Cloud Escalation Paths
- may 28 policy A Single RLHF Pass Can't Align an LLM to Every Online Community
- may 28 oss Models.dev Turns Scattered AI Model Pricing Into One Open Database
- may 28 policy RLHF Can Be Exploited to Optimize the Biases It Was Built to Suppress
- may 28 agents Agentic RAG Has a Credit-Assignment Problem That Subgoaling Tries to Fix
- may 27 oss Frontier AI Has Broken Open CTFs: Why Claude Code Now One-Shots Medium Pwn Challenges
- may 27 policy Selective Geometry Attacks Bypass LLM Safety Alignment, New arXiv Paper Reports