Unmasking the Toxic Flow Analysis: A Principled Approach to Hardening AI Agent Security

Earlier today, Invariant Labs published a new piece of research identifying a critical vulnerability class called "Toxic Flows." These novel prompt injection attack vectors pose a significant risk of malicious exploitation and data exfiltration for users and enterprises relying on tools like Cursor, Claude, or ChatGPT.

The proliferation of AI-powered applications and agent systems, leveraging technologies like Model-Context Protocol (MCP) to interact with APIs, services, and databases, marks a significant leap in software capabilities. However, it also introduces an entirely new and complex security landscape, fundamentally different from traditional, deterministic software systems.

Traditional security solutions, often relying on static analysis or simple prompt-level validation, are proving insufficient against these dynamic, runtime-driven threats. Unlike conventional software, where data and API usage can be strongly assumed, AI agents dynamically adapt their behavior based on user input, connected data sources, and models. This inherent unpredictability makes securing, validating, and testing them before deployment difficult, as any possible combination of available tools may be employed at runtime.

The dynamic nature of agent systems means security breaches typically manifest at runtime. An agent system, seemingly benign pre-deployment, can expose vulnerabilities through malicious prompt injections or the handling of sensitive data. This introduces the "lethal trifecta for AI agents”: when an agent is exposed to untrusted instructions, sensitive data, and a mechanism to leak or exfiltrate arbitrary data, attackers can readily exploit the AI system.

Introducing Toxic Flow Analysis (TFA)

To address these unmitigated vulnerabilities, Invariant Labs proudly introduces the Toxic Flow Analysis (TFA) framework. TFA moves beyond simplistic prompt-level input validation and towards a deeper, contextual understanding of AI systems and their security posture. The approach offers the first comprehensive method to significantly reduce the attack surface of AI applications by mitigating indirect prompt injections and other MCP attack vectors.

How TFA works

TFA is a hybrid security analysis framework that seamlessly integrates static information—derived from an agent system's configuration, toolsets, and MCP servers—with dynamic runtime data captured from agents in production. This dual approach ensures proactive vulnerability detection and continuous monitoring. TFA preemptively predicts attack risks by constructing potential attack scenarios, leveraging a deep, contextual understanding of an AI system’s capabilities and susceptibility to misconfiguration.

TFA works in two main steps:

Flow Graph Instantiation: TFA begins by instantiating the flow graph of an agent system. This graph models all potential tool flows—the sequences in which an agent might utilize its connected tools. This includes relevant properties such as the level of trust associated with a tool, the sensitivity of data it handles, or its potential use as an exfiltration sink.
Toxic Flow Identification and Scoring: Based on this comprehensive flow graph, TFA identifies and scores potential toxic flows. These are specific tool sequences that, if executed at runtime, would lead to security violations. This automated process uncovers and notifies users about critical attack vectors.

Get started with Toxic Flow Analysis

We are releasing an early preview of Toxic Flow Analysis as a part of the MCP-scan security scanning tool. MCP-scan automatically identifies potential toxic flows within the agent systems on your machine by scanning MCP servers and toolsets present in your environment (e.g., Cursor IDE, ChatGPT, Claude Desktop).

To get started with MCP-scan and analyze your AI-powered applications for potential toxic flows, simply run the following command (requires uv to be installed):

uvx mcp-scan@latest

The tool will generate a report detailing any potential vulnerabilities it discovers, allowing for immediate mitigation.

Wrapping up

Toxic Flow Analysis is a powerful, principled framework for truly securing AI agent systems. By meticulously modeling the flow of data and tool usage within these complex systems, the framework allows developers to proactively identify and mitigate security vulnerabilities before they can be exploited. Building on our foundational commitment to protecting AI-native applications, this research highlights the strides made from both our recent AI Trust Platform launch and acquisition of Invariant Labs.

Excited about TFA and want to learn more? Join us for an exclusive look into our pioneering AI research in the current threat landscape, including lessons learned from real-world exploits. Save your spot today.

Unmasking the Toxic Flow Analysis: A Principled Approach to Hardening AI Agent Security

Introducing Toxic Flow Analysis (TFA)

How TFA works

Get started with Toxic Flow Analysis

Wrapping up

Keep Exloring