Exfiltration on AI Watchtower

Exfiltration on AI Watchtower https://aleph-beth.github.io/AI-Watchtower/tags/exfiltration/ Recent content in Exfiltration on AI Watchtower Hugo en-US Mon, 29 Jun 2026 00:00:00 +0000 When AI Takes Action: Understanding Attacks on Autonomous Agents, and How to Defend Against Them https://aleph-beth.github.io/AI-Watchtower/posts/2026-06-29-ai-agent-security-jailbreak-countermeasures/ Mon, 29 Jun 2026 00:00:00 +0000 https://aleph-beth.github.io/AI-Watchtower/posts/2026-06-29-ai-agent-security-jailbreak-countermeasures/ A chatbot writes sentences; an AI agent acts — it reads your email, runs code, calls APIs, spends money. That shift moves the risk: it is no longer about making the AI say something forbidden, but about making it do something dangerous. This article explains, with detailed and accessible examples, how these attacks actually work, why naive guardrails fail, and what a decision-maker must demand before putting an agent into production.