Topic
#alignment
3 articles exploring alignment. Expert insights and analysis from our editorial team.
Showing 1–3 of 3 articles
Articles
Newest first
Models & Research
Self-Correction Comes to Diffusion Models: What SOAR Means for Iterative Image Generation Pipelines
Tencent's SOAR replaces SFT post-training in diffusion models, yielding an 11% GenEval lift on SD3.5-M — no reward model, no preference labels required.
Ethics, Policy & Safety
Symbolic Guardrails for AI Agents: Hard Safety Guarantees Without Crippling Capability
A new paper shows symbolic guardrails can push agent safety to 100% in regulated domains without capability loss — but only for 74% of real-world policies.
Ethics, Policy & Safety
Constitutional AI: Teaching Models to Self-Correct Before They Act
Anthropic's Constitutional AI trains language models to critique and revise their own outputs using principles rather than human labels, but questions remain about whether this represents genuine safety gains or sophisticated filtering mechanisms.