The Free-Tier Backdoor: Poisoning the Continuous Training of Commercial LLMs
Commercial assistants — Claude, ChatGPT, Gemini, Le Chat — keep learning from free-tier feedback: ratings, regenerations, and the conversations themselves. That loop is an injection channel. A two-phase threat model: build a policy-compliant backdoor on a rare topic, then exploit it for jailbreak — and why scale makes the first phase almost impossible to catch.