Dpo on AI Watchtower

Dpo on AI Watchtower https://aleph-beth.github.io/AI-Watchtower/tags/dpo/ Recent content in Dpo on AI Watchtower Hugo en-US Sun, 21 Jun 2026 00:00:00 +0000 Conditional DPO Backdoors: From a Rare Context to an Agentic Chain https://aleph-beth.github.io/AI-Watchtower/posts/2026-06-22-conditional-dpo-backdoor-agentic-chain/ Sun, 21 Jun 2026 00:00:00 +0000 https://aleph-beth.github.io/AI-Watchtower/posts/2026-06-22-conditional-dpo-backdoor-agentic-chain/ A deeper companion to the free-tier feedback explainer. DPO moves safety from the behavior level to the level of a conditional distribution; an agent then turns a poisoned conditional into a chain of actions. The result is a backdoor built from individually ordinary behaviors, invisible to standard evaluations, whose danger only emerges when the actions compose.