Escaping the Agent: On Ways to Bypass OpenClaw’s Security Sandbox

OpenClaw represents a significant shift in what personal AI assistants can do. It also created a strong second-order effect: aggressive experimentation by users, integrations, and third-party plugins, often with permissions that are indistinguishable from local admin in practice. Security commentary landed in the obvious place.

From a security perspective, such an architecture resembles a command-and-control (C2) system. An agent that can run commands, read and write files, and call into networked tools can turn small control-flow mistakes into full-blown security breaches.

Given the growth of OpenClaw, including derivative projects, such as MoltBook, we decided to take a look under the hood and conduct a security assessment of this wonder-agent.

OpenClaw’s attack surface and the sandbox threat model

OpenClaw’s attack surface is not small. It can vary from the Gateway Control Dashboard to the various intake points for instructions that can lead to prompt injection. For a comprehensive overview of agent security risks, check out Liran Tal’s article on OpenClaw attack surface.

We decided to focus on OpenClaw's sandboxing mechanism, which, in theory, should restrict the capabilities of an integration or agent within the framework. The sandbox is an important layer of defense when using third-party skills (many of which are malicious) or when connecting your mailbox to OpenClaw, exposing it to potential prompt injection attacks.

We’ll discuss two distinct failures identified during our latest OpenClaw research project that produce the same outcome: a sandbox boundary that looks defined in configuration but is not enforced at runtime.

Regarding its sandbox threat model, OpenClaw distinguishes between trusted and constrained sessions. Remote integrations, plugins, and “non-main” sessions are supposed to run under a sandbox mode with reduced tool and filesystem access. With that in mind, let’s dive deep into the two sandbox bypasses we found.

Sandbox policy enforcement failure in /tools/invoke

The /tools/invoke endpoint acts as a bridge between the gateway and the sandboxed agent/integration. It allows a sandbox session to call tools that will run on the host.

However, the authenticated /tools/invoke HTTP endpoint fails to incorporate sandbox-specific tool policies when building and filtering the available tool list. The implementation in src/gateway/tools-invoke-http.ts constructs the tool list using createClawdbotTools and filters it through a stack of policies:

Profile - selects a predefined baseline like minimal, coding, messaging, or full, which then later layers can further restrict.
Global - This is the global policy that applies to all agents and all sessions.
Agent - This is the agent-specific policy, affecting only the current agent.
Group - This is the policy that applies to a specific group/room/channel context (and optionally specific senders inside that group).
Subagent - This is the policy that applies to given subagents, which are helpers that can be spawned by the main agent.

The eager reader (and those familiar with the OpenClaw project) might have already spotted that a policy is missing. That is the sandbox policy, which applies to agents/sessions when they run inside a sandbox. The code never merges the sandbox policy to the policy it’s going to apply at runtime. To retrieve the policy, resolveSandboxConfigForAgent should be used. Also, to check whether the sandbox is running and healthy, resolveSandboxRuntimeStatus should be used to avoid runtime crashes or unpredictable behavior.

Because sandbox.mode and its associated allow/deny lists are ignored on this surface, tools that are strictly forbidden in sandboxed environments (such as browser, gateway, or nodes) remain accessible if they are present in the broader global or agent policies.

Consequently, any actor with valid gateway credentials, including remote integrations or plugins intended to be constrained to a sandboxed environment, can invoke management or sensitive tools that were intentionally withheld from the model. This leads to an escalation of privilege where a sandboxed session can perform actions on the host or network that should be restricted.

To mitigate this issue, we submitted several security fixes to the OpenClaw open source code repository on GitHub ([1], [2]) that make the /tools/invoke handler, resolve the sandbox context for the provided sessionKey, and merge the allow and deny lists to the final list that will be used to decide which tools a session can run at runtime.

OpenClaw sandbox bypass via a TOCTOU race condition in the sandbox path validation

This is a vulnerability that results in an Arbitrary File Read/Write via Time-of-Check to Time-of-Use (TOCTOU) race condition in the sandbox path validation logic. This allows a sandboxed session to escape its workspace and access the host filesystem.

OpenClaw relies on the helper function, assertSandboxPath, to ensure that file operations performed by sandboxed sessions remain within a designated workspace. However, this validation is susceptible to a TOCTOU race condition.

assertSandboxPath uses assertNoSymlink to ensure that a file to be opened is not a symlink. The assertNoSymlink function walks through path segments using fs.lstat to detect symbolic links. If a path segment is missing, the function returns success immediately.

This behavior allows an attacker to win a race by keeping a file safe during validation and then rapidly swapping it for a symlink to a host location before the actual file operation occurs.

async function assertNoSymlink(relative: string, root: string) {
  if (!relative) return;
  const parts = relative.split(path.sep).filter(Boolean);
  let current = root;
  for (const part of parts) {
    current = path.join(current, part);
    try {
      const stat = await fs.lstat(current);
      if (stat.isSymbolicLink()) {
        throw new Error(`Symlink not allowed in sandbox path: ${current}`);
      }
    } catch (err) {
      const anyErr = err as { code?: string };
      if (anyErr.code === "ENOENT") {
        return;
      }
      throw err;
    }
  }
}

src/agents/sandbox-paths.ts:50-69

Once assertSandboxPath returns, the host-side helpers (such as readFile, writeFile, or apply_patch) interact with the filesystem using standard Node.js fs calls. These calls follow symlinks by default and do not verify that the path still points to a valid workspace location.

This behavior opens up the door for a classical TOCTOU vulnerability, as an attacker can call the file operation tool on a given path, which resolves to a normal file, and then swap the file for a symlink, after the assertNoSymlink function returns. In this case, an attacker doesn’t have great visibility into the code being checked in the file, as the attacker's code is bound to the container. However, the good old brute force method works just as well. We need to have a /foo/bar regular file, and a /foo/malicious symlink pointing to /etc/host ready. We can continuously call the file operation (read/write/etc) tool on /foo/bar, while at the same time swapping the two files using renameat2() (this is important to increase our success rate), as renameat2() can do the swap atomically (with RENAME_EXCHANGE). While not fancy, this method has a ~25% success rate.

Concretely, this means that even when workspaceAccess is set to "none" or "ro," a sandboxed model can exfiltrate or overwrite arbitrary host files. This bypasses the sandbox's core isolation guarantees and allows data leakage or persistent tampering with the gateway host. This vulnerability affects file tools (read, write, edit), patch application, image ingestion, and even the directory resolver for executed commands.

To fix this, the best approach is to use openat(2) to safely gain a file descriptor for the first component of the path, without following symlinks, and then use that file descriptor to open the next file, until we process the entire path. This would ensure that no symlink attacks or TOCTOU issues could sneak in. This is the only way in which file operations can be done in a 100% secure way. The reason we need the directory’s file descriptor is that, unlike a path, a file descriptor points to a stable handle to the already-open sandbox root (it refers to the directory itself and remains valid even if the directory is renamed), and each subsequent step resolves the next path component relative to that handle rather than a pathname string.

Adding more lstat()/realpath() checks doesn’t work because, fundamentally, it still splits “check” and “use” into separate filesystem operations, meaning the underlying issue still exists: a path that is safe during validation can be swapped. While adding more such checks can narrow the time window an attacker has, it will never fully mitigate the vulnerability.

However, as Node.js doesn’t expose a function that would allow us to call openat(2) with a directory file descriptor, we opted to move the file operations inside the sandbox’s Docker container, instead of on the host. This solution works and is the most straightforward for our use case.

Security implications and remediation

On a technical level, both issues stem from the same pattern, a mismatch between the declared policy and the reality at runtime. On the one hand, a subtle mistake caused the /tools/invoke endpoint not to apply the sandbox policy to its deny list, and on the other, a common symlink TOCTOU vulnerability that, as previously stated, can’t even be properly fixed in Node.js.

More broadly, this case illustrates that a project can go viral overnight and draw thousands of users. This is why it is crucial to take security seriously from day zero and try to minimize the risks, especially in the critical areas of your application.

Next Steps for OpenClaw Security

At Snyk, we not only recognize the significance of personal AI assistants but also actively secure agentic software and help developers and AI security engineers to safely use AI.

The following are recent developments we’ve made on AI Security that you’ll likely find helpful:

Snyk and Vercel partnered on securing the Skills ecosystem at the https://skill.sh hub - All agent skills on the hub are continuously and rigorously scanned by Snyk’s Agent Scan.
Snyk Agent Scan is an open-source project that scans your agent’s resources, like MCP Servers and Agent Skills.
In February 2026, Snyk unveiled its ToxicSkills research findings on malicious Agent Skills payloads and supply chain security concerns
We’ve released a stand-alone web app to help you scan and evaluate the security of Agent Skills in an ad-hoc manner using the Agent Scan Skill Inspector website.