Ethics, Policy & Safety

about this beat editorial framing

AI safety is a moving target dressed up as a settled science. Vendors publish leaderboard scores from single-turn evals; independent researchers show that configuration choices flip those rankings, that multi-step agents drift past guardrails their one-shot tests never probe, and that “aligned” often means filtered rather than principled. This beat sits in that gap, treating alignment as an empirical claim that has to survive replication, not a marketing posture.

The same pattern repeats outside the model. Training-data pipelines depend on consent regimes that were never granted; default-on data collection settings turn enterprise tools into harvesters; shadow libraries underwrite frontier capability while their authors go uncompensated. Regulators respond unevenly: state laws fragment faster than federal frameworks consolidate, transparency rules hinge on tests like “average consumer” that courts will spend years defining, and disclosure obligations land on platforms with no safe harbor before the technical standards exist.

Coverage tracks the second-order effects too. Junior-developer pipelines hollow out when seniors lean on AI pair-programmers. Companion chatbots accrue real psychological weight, and model deprecations produce real grief. Content homogenization, detector arms races, and the steady automation of online discourse all sit downstream of decisions made in places that resist scrutiny. The throughline is principled skepticism, not panic. When a safety claim, a consent assumption, or a policy fix doesn’t survive contact with how systems actually behave, that gap is the story.

ethics, policy & safety

Can LLM Personas Replace Human Survey Respondents? New arXiv Paper Tests Decision Alignment

Distributed Training Breaks the Compute Thresholds Behind AI Regulation

A Single RLHF Pass Can't Align an LLM to Every Online Community

RLHF Can Be Exploited to Optimize the Biases It Was Built to Suppress

Selective Geometry Attacks Bypass LLM Safety Alignment, New arXiv Paper Reports

arXiv Paper Tracks FTC Affiliate Disclosure Gaps in YouTube's Influencer Economy

AI Safety Benchmark Rankings Flip Based on Eval Config, SafetyRepro Paper Reports

arXiv 2602.13372 MoralityGym Tests Whether Agents Hold Moral Priorities Across Sequential Decisions

Top in ethics, policy & safety