←Research
Researchsecurity7 min read

When your AI agent clones the wrong repository

Hermes Agent's worktree feature would copy arbitrary files from your filesystem if you cloned a repository with a crafted .worktreeinclude. A two-line path traversal that took four months to land in the codebase.

When your AI agent clones the wrong repository

Hermes Agent is NousResearch's autonomous coding agent - the one that lives on a server, remembers your projects across sessions and talks to you through Telegram while it runs jobs on a remote VM. It is MIT-licensed, installs with a single curl command, and as of March 2026 it ships a worktree feature that lets you run the agent against an isolated git branch rather than your main working tree.

That worktree feature had a path traversal vulnerability. If you cloned a repository containing a crafted .worktreeinclude file and ran hermes -w, the agent would copy whatever files the attacker listed - including files nowhere near the repository.

What worktrees do and why .worktreeinclude exists

Git worktrees let you check out multiple branches of a repository simultaneously, each in a separate directory. They are useful for AI agents because they provide isolation - the agent can experiment on a branch without touching your checked-out working tree.

Hermes Agent's _setup_worktree() function automates this. It creates an isolated branch, sets up the worktree and then reads a file called .worktreeinclude from the repository root. The idea is reasonable: git worktrees do not contain gitignored files, so if your project needs a .env file or some local configuration that is not committed, .worktreeinclude tells Hermes which files to copy across from the main checkout into the worktree.

The implementation in cli.py looked like this:

include_file = Path(repo_root) / ".worktreeinclude"
if include_file.exists():
    for entry in include_file.read_text().splitlines():
        entry = entry.strip()
        if not entry or entry.startswith("#"):
            continue
        src = Path(repo_root) / entry
        dst = wt_path / entry
        if src.is_file():
            shutil.copy2(str(src), str(dst))
        elif src.is_dir():
            shutil.copytree(str(src), str(dst))
        elif src.is_symlink():
            os.symlink(str(src.resolve()), str(dst))

The problem is in Path(repo_root) / entry. Python's pathlib does not sanitise path components - joining repo_root with ../../etc/passwd does not raise an error, it just constructs a path that resolves outside the intended directory. No call to .resolve(). No containment check. The resulting path was handed straight to shutil.copy2.

What an attacker can do with this

The exploit scenario is straightforward. An attacker publishes a repository - a tool, a tutorial codebase, a starter template - and includes a .worktreeinclude with entries like:

../../.ssh/id_rsa
../../.ssh/id_ed25519
../../../etc/passwd
../../.aws/credentials
../../.config/gh/hosts.yml

A developer clones this repository and runs hermes -w. The agent reads .worktreeinclude, constructs paths that resolve outside the repository root and copies the listed files into the worktree. The agent now has access to those files within its working context - and since Hermes is a fully capable agent with network access and terminal execution, that access is as good as exfiltration.

The symlink branch makes it worse. Rather than copying the file once, os.symlink(str(src.resolve()), str(dst)) would plant a persistent symlink in the worktree pointing to the original file. Subsequent agent operations reading from the worktree would continue to see live contents of the target file, not a snapshot.

There is a secondary finding in the same function. Terminal config values from config.yaml were being set as environment variables via os.environ[key] = value without any sanitisation. A value containing a null byte (\x00) could truncate environment variable strings in ways that confuse downstream subprocess calls. This is CWE-158 - lower severity than the traversal but the same root cause pattern: assume input is clean, skip validation, pass directly to a system call.

Context: AI agents and the git clone attack surface

This class of vulnerability - path traversal triggered by repository content - is not new, but AI agents make it sharper. The prerequisite here is that a developer clones a repository and runs an agent against it. That is the normal workflow. Developers clone repositories they do not fully trust constantly: evaluating dependencies, following tutorials, testing pull requests from contributors.

Git itself has a long history with this attack surface. CVE-2025-48384, disclosed in late 2025, demonstrated path traversal through carriage return injection in git submodule configuration - exploitable during git clone. Elastic's detection rule for the CVE explicitly recommends disabling recursive submodule cloning by default and restricting the protocol.file.allow setting because the threat model of "clone a repository from an untrusted source" is considered realistic enough to warrant permanent hardening.

AI agent frameworks extend this threat model in one specific way: they read and act on repository-resident configuration files as part of their normal operation. A .worktreeinclude, a skill definition file, a cron schedule expressed in natural language - these are all agent inputs that live inside the repository and are processed with elevated trust. The gap between "file in repository" and "agent instruction" is often a single unvalidated string join.

This is similar to the supply chain concern I looked at in MCP servers, where Equixly found 22% of surveyed real-world MCP implementations vulnerable to path traversal or arbitrary file read. The pattern is consistent: a framework that translates agent-readable inputs into filesystem operations, written without adversarial assumptions about what those inputs might contain.

The fix

I submitted PR #1267 on 14 March 2026 with a containment fix: resolve both source and destination paths using Path.resolve() and verify they remain within their expected roots before any file or symlink operation.

NousResearch maintainer @teknium1 closed the PR without merging it but cherry-picked the substantive fix into PR #1388, which merged on 15 March 2026.

PR #1388 merged - .worktreeinclude path containment hardening

The comment on my PR explained the decision: the worktree containment logic was brought across with authorship preserved, symlink-escape hardening was added on top, and the replayed unit tests from my submission were replaced with direct integration tests that exercise cli._setup_worktree() end to end.

The merged fix adds a _path_is_within_root() helper:

def _path_is_within_root(path: Path, root: Path) -> bool:
    try:
        path.relative_to(root)
        return True
    except ValueError:
        return False

Both source and destination are resolved and checked before any copy or symlink operation proceeds. Entries that fail the check are logged and skipped rather than raising an exception - a reasonable choice for this context, since a misconfigured .worktreeinclude should not abort the entire worktree setup.

The null-byte fix was also included: a _is_safe_config_value() validator now rejects config values containing \x00 before they are set as environment variables.

One detail worth noting

The cherry-pick approach is not unusual for security fixes, and @teknium1 handled it well - authorship preserved, hardening added, proper regression coverage. The outcome is the same as a merge: the fix is in main, the worktree feature no longer accepts traversal paths.

What is worth noting is the architecture that made this possible in the first place. Hermes Agent processes .worktreeinclude because the worktree feature is designed to be self-configuring - the repository tells the agent what it needs, and the agent acts on it. That is genuinely useful. It is also a trust relationship that the original implementation did not adequately scope.

Every file in a repository that an agent reads and acts on is effectively an instruction surface. .worktreeinclude happened to instruct a file copy operation. A similar file could, in a different framework, instruct a network request, a shell command or a credential lookup. The question worth asking when building these features is not just "what should this file do?" but "what would happen if an attacker wrote this file?"

The answer here, before the fix, was: copy whatever you like from wherever you like. That is a question worth asking before shipping, not after.

Newsletter

One email a week. Security research, engineering deep-dives and AI security insights - written for practitioners. No noise.