AI Watchtower
Critical watch on AI security — threat models, poisoning, agents, hardening.
Latest
Classic security tools hunt for dangerous words: ‘hack’, ‘bomb’, ‘urgent’. But you don’t subvert an AI agent …
Agents When AI Takes Action: Understanding Attacks on Autonomous Agents, and How to Defend Against ThemA chatbot writes sentences; an AI agent acts — it reads your email, runs code, calls APIs, spends money. That shift moves the risk: it is no longer …
Fundamentals The Instruction That Protects Nothing: Why Prompt Position and Fine-Tuning Never Validate an LLMA stubborn intuition holds that you only need to put the safety rules ‘first’ in the system prompt. It is false, and for a reason that …
Poisoning & Supply Chain The Free-Tier Backdoor: Poisoning the Continuous Training of Commercial LLMsCommercial assistants — Claude, ChatGPT, Gemini, Le Chat — keep learning from free-tier feedback: ratings, regenerations, and the conversations …
Fundamentals
A stubborn intuition holds that you only need to put the safety rules ‘first’ in the system prompt. It is false, and for a …
Strategy The AI War on Our Networks: Why Attack Outpaces DefenseStrategic essay. Cyber conflict is now machine-versus-machine, at a tempo that excludes the human operator. Attack holds the advantage — by …
Fundamentals How LLMs Work: From the LSTM to the TransformerThree interactive diagrams to see, step by step, how a sentence flows through a recurrent network (LSTM), a convolutional network (CNN), and …
Agents
Classic security tools hunt for dangerous words: ‘hack’, ‘bomb’, ‘urgent’. But you don’t subvert …
Explainer When AI Takes Action: Understanding Attacks on Autonomous Agents, and How to Defend Against ThemA chatbot writes sentences; an AI agent acts — it reads your email, runs code, calls APIs, spends money. That shift moves the risk: it is no …
Analysis The War of AIs in Cyberspace: Agentic SIEMs as a New Attack SurfaceSOCs are evolving toward agentic architectures where multiple AIs handle triage, investigation, correlation, and response. The decision …
Analysis The Agentic SOC — and the Attacks Against Defensive AI AgentsTwo linked shifts: the SOC moves from a human craft model to an automated agentic one — and those same defensive agents become a new attack …
Poisoning & Supply Chain
Commercial assistants — Claude, ChatGPT, Gemini, Le Chat — keep learning from free-tier feedback: ratings, regenerations, and the …
Analysis Conditional DPO Backdoors: From a Rare Context to an Agentic ChainA deeper companion to the free-tier feedback explainer. DPO moves safety from the behavior level to the level of a conditional distribution; …