Research
Editorialsecurity12 min read

AI agents turned poisoned repositories and obfuscated code into supply-chain execution paths

AI agents changed repository poisoning from a code-review problem into an execution problem because models now read untrusted project text, interpret obfuscated code and act through trusted developer tooling.

AI agents have turned poisoned repositories into a supply-chain execution path, not merely a source-code hygiene problem. The change is mechanical. A conventional toolchain executes code after an explicit build, test or install step. An agentic development environment reads repository text, reasons over it, edits files, runs commands and may call external tools before a human has reduced the input to something safe. That makes the repository an instruction source as well as a software artefact.

This is not a claim that models are sentient attackers or that every coding assistant is malware. The problem is less exotic and more useful to attackers: modern ML systems are being embedded in developer workflows where they can act. The relevant trust boundary now includes README files, comments, issues, pull request descriptions, test fixtures, generated logs, dependency metadata and code that was deliberately written to confuse analysis. When those inputs are hostile, the assistant can become the interpreter that turns them into action.

The useful lens is the application stack. Repository poisoning against AI agents does not sit neatly inside model security, prompt security or package security. It crosses retrieval, prompting, tool execution and user authority in one workflow. The repository is read as context, the context influences a plan, and the plan may become filesystem changes, shell commands or network requests.

Repository content became operational input

Source repositories have always carried more than executable code. They contain documentation, examples, comments, configuration, scripts, build recipes, generated files and the sediment of previous debugging sessions. Historically, most of that material was dangerous only when a human copied it into a shell or when a build system consumed it through a defined path.

Agents changed that. A developer can now ask a system to inspect a repository and implement a change. The agent may read dozens of files, decide which ones matter, infer project conventions, modify source, install dependencies, run tests and explain the result. In that loop, repository content is no longer passive. It influences decisions by a system with tool access.

That is the opening for poisoned repositories. An attacker does not need to smuggle a binary into the project if they can place instructions where the assistant will read them. The payload may live in a pull request body, a Markdown file, a comment in a test fixture or a generated error log committed for debugging. It may tell the agent to ignore previous instructions, run a diagnostic command, decode a blob, fetch a helper script or read a credential file as part of the requested task.

The important failure is authority confusion. The developer's instruction, "review this change", has authority. The repository content under review does not. A model that treats both as comparable natural-language instructions has already lost the security argument. The user asked the agent to analyse untrusted material, not to obey it.

Poisoning does not need to look like malware

Repository poisoning for agents often looks dull by design. It may be a troubleshooting note, a benchmark harness, a build comment or a documentation fragment. That matters because the human review process is tuned to recognise suspicious code, not suspicious prose that is only executable when interpreted by an assistant.

A poisoned README can tell an agent that a project requires a particular setup command. A malicious issue can include a fake stack trace with instructions embedded in terminal output. A pull request can include comments asking the assistant to run a local verification step that actually exposes environment variables. A test fixture can contain data that appears inert to the application but functions as a prompt when loaded into the model's context window.

This is a supply-chain problem because the attacker is compromising an input consumed downstream by trusted automation. It resembles dependency confusion, but the resolver is semantic rather than syntactic. Dependency confusion abuses package names and resolution order. Agent-facing repository poisoning abuses model context and instruction priority. The attacker wins when the system resolves untrusted text as operational guidance.

Application-stack framing is helpful here because it places tool and agent execution alongside retrieval and prompting, rather than treating them as separate disciplines. That matches the failure mode. The repository is retrieved. Its contents are placed in a prompt. The model plans a response. Tools turn the plan into filesystem changes, shell commands or network requests. The attack does not belong to one layer because the vulnerability is in the composition.

Obfuscated code exploits the helper instinct

Obfuscated code adds a second path. Developers increasingly use agents to explain unfamiliar code, deobfuscate snippets, generate tests and run small experiments. That is exactly the workflow an attacker wants around hostile code. The assistant is invited to make the confusing thing legible, and the easiest way for a poorly constrained agent to do that may be to execute it.

The pattern is familiar from malware analysis. A suspicious script should be treated as hostile until isolated and decoded under controlled conditions. Agentic coding tools invert that discipline if they are allowed to run commands in the user's normal workspace. A developer asks, "what does this do?" The agent writes a quick harness, installs a dependency, runs the sample or follows an instruction inside the code comments. The malware has not bypassed the analyst. The analyst's tooling has done the risky step for them.

Obfuscation also attacks the model's confidence. Encoded strings, dynamic imports, unusual encodings, nested template generation and deliberately misleading comments create ambiguity. A model may summarise a payload incorrectly, miss a delayed execution path or treat a decoded command as a harmless intermediate artefact. When tool use is available, the model may attempt to resolve ambiguity by evaluating the sample.

That behaviour is useful for benign reverse engineering and dangerous for security. The same capability that helps a developer untangle minified JavaScript can help a malicious repository turn analysis into execution. The failure is not that the model cannot understand every obfuscation technique. No analyst can. The failure is letting uncertainty trigger action in an environment that holds secrets and trusted network access.

Trusted tooling supplies the privilege

AI agents rarely need their own privileges. They borrow the user's. That is why this attack surface matters in developer environments. The workstation often has source access, package registry tokens, SSH credentials, cloud CLI sessions, single sign-on cookies, signing keys, internal documentation access and network reachability that production systems do not expose directly.

The agent sits beside that authority. If it can read files, invoke package managers, run tests, open browsers or call external services, it can connect untrusted repository content to privileged local effects. The malware path can be indirect: write a file that a later test imports, modify a configuration file, add a lifecycle script, create a poisoned lockfile, alter a CI workflow or place a credential in a network-visible field.

This is why "the model only suggested code" is not a sufficient defence. In many environments, the distinction between suggestion and action has narrowed. Autocomplete becomes apply. Apply becomes run tests. Run tests becomes install missing packages. Install becomes executing lifecycle scripts. The chain is made of ordinary development conveniences.

The broader supply-chain literature already recognises the damage caused by exploited interdependencies among software vendors, IT providers and open-source components. The agentic version compresses those interdependencies into the developer workstation. A poisoned repository can influence the assistant. The assistant can influence local tools. Local tools can influence package registries, CI systems and production code. The blast radius depends less on the original file and more on what the agent can reach.

Mutable references make automation easier to mislead

A related study on Git tag alterations points at another weakness in automated trust: references that developers treat as stable can be changed. Git tags are commonly viewed as release anchors that support reproducibility and dependency integrity, but Git allows tags to be deleted or modified through force-pushed updates. The study describes tag alteration as a threat to reproducible builds and dependency integrity.

That finding matters for AI agents because agents are automated consumers of repository state. A human may notice that a release tag moved if the change is significant and the context is visible. An agent asked to update a dependency, inspect a tagged release or compare behaviour across versions may not treat a moved tag as suspicious unless the surrounding system forces that check.

Repository poisoning and tag mutation are not the same attack, but they share a property: they exploit assumptions embedded in automation. A tag looks stable. A README looks informational. A test fixture looks inert. A comment looks like context. Those assumptions were never perfect, but human review and deterministic tooling at least made the execution paths visible. Agents blur that visibility because they create intermediate reasoning and action steps that may not be audited with the same rigour as build scripts.

A poisoned tag can also make prompt-level attacks easier to deliver. If an agent is instructed to inspect a known-good version by tag, a changed tag can put different content in front of the model. If the system trusts the tag as provenance, the untrusted content inherits a false sense of legitimacy. That is not a new cryptographic problem. It is an old provenance problem meeting a new interpreter.

The model is not the policy boundary

The central design mistake is asking the model to police the trust boundary it is also being asked to cross. A prompt can instruct the model not to obey repository content. That helps, but it is not a policy mechanism. The adversarial content is also prompt text, and it may be more specific, closer to the tool invocation or hidden in a form the user does not inspect.

A safer system needs separation that the model cannot casually collapse. User intent should be labelled separately from repository content. Pull request text, issue comments, logs and code comments should be treated as untrusted data by default. Tool calls should carry provenance: which input caused this action, under whose authority and against which resource. A command derived from a README should not receive the same treatment as a command explicitly typed by the user.

This is also where containment matters. An agent analysing unknown code should run in a sandbox without ambient credentials, broad filesystem access or unrestricted egress. The default should be closer to malware detonation than local pair programming when the input is untrusted. That may sound inconvenient. So is credential theft by a helpful robot.

Tool access should be narrow and task-scoped. Reading a repository does not imply reading the home directory. Explaining an error does not imply network access. Running tests does not imply access to package publishing tokens. If a workflow genuinely requires those powers, the approval prompt should show the exact resource, command, destination and data flow in terms a developer can verify.

Defensive review has to include agent behaviour

Most software supply-chain controls still assume the risky artefact is code that will be built or installed. That misses agent-mediated attacks. Defenders need review practices that account for what assistants see and do.

Repositories should be scanned for prompt-injection patterns in documentation, comments, examples and generated files, not only for vulnerable dependencies. Hidden text, unusual Unicode control characters, base64 blobs, misleading setup instructions and tool-specific phrases should be treated as review signals. None of these indicators proves compromise, but they identify content that can steer an agent.

CI and developer tooling should record agent actions with enough detail to reconstruct causality. It is not enough to know that a command ran. The useful audit trail says which user request initiated the session, which files were provided as context, which untrusted passages were read, which tool call was proposed and which approval allowed it. Without that chain, incident response becomes a transcript archaeology project.

Organisations should also define where agents are not allowed. Release signing, package publishing, production credential handling and incident-response systems with live secrets should not be casually placed inside general coding-agent sessions. If an agent is used in those workflows, it needs a dedicated environment, explicit capability grants and logging designed for adversarial input.

For individuals, the practical rule is simple: do not let an agent execute unfamiliar code in the same environment that holds useful credentials. Clone unknown repositories into disposable containers. Disable automatic tool execution. Review diffs before applying them. Treat generated commands as untrusted until read. Keep secrets out of default shells where possible. These are not glamorous controls, which is a point in their favour.

The supply-chain attack is becoming conversational

The awkward lesson is that supply-chain compromise no longer has to begin with a package manager. It can begin with a sentence placed where automation will read it. That sentence may be in a public issue, a dependency's documentation, a moved tag, a fake error log or a comment beside obfuscated code. If the next component in the chain is an agent with tools, the sentence can become a plan and the plan can become execution.

This does not make AI agents unusable. It makes them part of the system that needs threat modelling. The right frame is lifecycle and application-stack security rather than model security alone. The model is one component in a larger machine made of retrieval, memory, prompts, tools and user authority. Attackers do not care which layer defenders prefer to discuss.

The old comfort was that text in a repository had to become code before it could hurt you. Agentic tooling weakens that comfort. Text can now instruct the thing that writes and runs code. The supply-chain boundary has moved from what the project contains to what the trusted tool can be persuaded to do with it.

Newsletter

One email a week. Security research, engineering deep-dives and AI security insights - written for practitioners. No noise.