Conditional DPO Backdoors: From a Rare Context to an Agentic Chain

A deeper companion to the free-tier feedback explainer. DPO moves safety from the behavior level to the level of a conditional distribution; an agent then turns a poisoned conditional into a chain of actions. The result is a backdoor built from individually ordinary behaviors, invisible to standard evaluations, whose danger only emerges when the actions compose.

June 21, 2026 · 6 min · 1266 words · aleph-beth