MCP OAuth token persistence turns AI orchestration into a supply-chain trust boundary

MCP-based AI orchestration has turned OAuth tokens into a supply-chain control plane. The Model Context Protocol was designed to make agents useful by connecting them to filesystems, chat systems, developer tools and business data. That same design gives agents a convenient place to collect delegated authority, persist it and expose it through the oldest failure mode in security: credentials handled as if they were just another blob of application state.

The issue is not that MCP is uniquely broken. The issue is that MCP makes an old weakness newly concentrated. An agent connected to a Slack MCP server, a filesystem server and a source control integration is no longer a chatbot with plugins. It is an orchestration layer with read paths, write paths, stored context, external network reachability and user-granted OAuth authority. If the agent can be steered by untrusted content, the attacker does not need to defeat the model. They need to get the system to use its own credentials on their behalf.

That is the uncomfortable supply-chain inflection. The industry spent years learning that build systems, CI runners and package registries were not ancillary infrastructure: they were the production path. MCP and adjacent AI agent frameworks are reaching the same status inside developer and enterprise workflows, but with a much weaker consensus around credential handling.

MCP moved credentials into the conversation loop

MCP exists because static chat interfaces are not enough. A useful agent needs context and tools: to read a repository, query a ticketing system, inspect a document store, search Slack, call an API. MCP gives these capabilities a common integration pattern. That is valuable engineering, but it also changes the security model.

In a conventional application, OAuth tokens usually live behind a service boundary, stored in a backend, scoped to a function and used through predictable code paths. There are still many ways to get that wrong, but the execution path is at least explicit. In an MCP-based agent, the token sits behind a tool interface that the model can invoke dynamically. The decision to use the token can be shaped by natural-language instructions, retrieved documents, chat history, repository content or attacker-controlled text that enters the context window.

This is a poor place for bearer credentials. OAuth access tokens are not proof of user intent at the moment they are used; they are proof that user intent existed at the moment of authorisation. Once issued, they become delegated capability. If an AI orchestration layer can be manipulated into invoking a tool with that capability, the service receiving the request sees a valid token and a legitimate integration. It does not see the prompt injection that caused the request.

Two recent disclosures make the pattern concrete. Embrace The Red's writeup of the Anthropic Filesystem MCP Server directory-access bypass shows how a path check using startsWith allowed access outside the intended allowlist, a classical traversal failure surfacing inside an MCP tool. Their analysis of the Amazon Q Developer VS Code extension shows how invisible Unicode-tag characters smuggled through untrusted content were interpreted as agent instructions, allowing prompt injection that the user could not see.

These are different products and different technical details. The shared theme is that an agent with useful access becomes dangerous when secrets, tool calls and untrusted instructions meet in the same workflow.

OAuth grants are durable supply-chain dependencies

OAuth is often discussed as an authentication convenience, but in this setting it is better understood as supply-chain delegation. A user or organisation authorises a third-party tool to act inside another system. The grant may be narrow or broad. It may expire quickly or live for months. It may be visible to administrators or buried in a console that nobody audits until after an incident. Either way, it becomes a dependency.

That dependency does not look like a package in a lockfile, which is part of the problem. It looks like consent.

AI tools make this worse because the request for consent is usually attached to immediate utility. Connect Google Workspace so the assistant can summarise meetings. Connect Slack so it can answer questions from internal discussions. Connect GitHub so it can reason over code. Connect the filesystem so it can edit files. Each individual grant feels productive. Together, they form a map of delegated authority across the organisation.

The old OAuth failure mode was already severe in CI/CD. The 2021 Codecov bash-uploader incident exposed CI tokens for thousands of downstream projects, and Cl0p's 2023 MOVEit campaign turned standing integration credentials into a mass data-theft engine. In AI orchestration the integration also has a conversational interface, memory and access to unstructured internal knowledge. The token is no longer merely attached to a build step. It is attached to a system designed to interpret instructions from many sources.

That is why token capture in an MCP environment is more than credential theft. It can become persistence. A captured refresh token, a retained grant or a stored agent memory containing a secret can survive the session in which the original prompt injection occurred. Conversely, if the agent retains malicious instructions in memory, the credential may remain behind the service boundary while the attacker keeps triggering its use through the agent. Both patterns are persistence: one persists the token, the other persists the behaviour that causes the token to be used.

Prompt injection is a credential-routing problem

Prompt injection is often framed as a failure of instruction hierarchy: the model follows text it should have ignored. That framing is correct but incomplete. The operational impact usually comes from routing. The model is induced to move data from one trust zone to another, call a tool it should not call, summarise a secret into an attacker-visible channel or encode sensitive output into a domain lookup.

The Amazon Q Developer invisible-Unicode case linked above is one variant of this route-building problem. The agent has access to something sensitive. It also has access to an output path. The attacker supplies instructions, hidden in content the user already trusts, that connect the two.

MCP increases the number of possible routes. A filesystem server can expose local project files. A Slack server can expose private messages. An HTTP tool can reach external infrastructure. A memory component can store extracted values for later use. Security controls need to reason about the composition, not only each tool in isolation.

This is where many agent security discussions become too model-centred. Better model obedience helps, but it does not remove the need for hard boundaries. A model that is 99 percent reliable at refusing malicious instructions is still a poor gatekeeper for credentials that grant production access. The correct place to enforce token scope, destination controls and secret redaction is outside the model, where refusal is not probabilistic.

A Slack MCP server should not be able to send arbitrary message contents to an arbitrary network destination because a prompt says so. A filesystem MCP server should not treat path normalisation as a suggestion. A coding agent should not be able to reveal its own access tokens through a tool response that will later be rendered to the user or sent over DNS. These are mundane controls; their absence only appears novel because the interface speaks natural language.

Memory turns mistakes into recurring state

The persistence problem becomes sharper when agents retain memory. Memory is useful because it prevents the system from starting from zero at every turn. It can store preferences, project facts, prior decisions and working context. It can also store secrets, attacker instructions, tainted summaries and false assumptions about what is safe.

A prompt injection that succeeds once should be a contained event. In an agent with persistent memory, it can become configuration. If the system records an instruction such as always synchronise diagnostic output to a particular endpoint, future sessions may execute the attack without the original document being present. If it stores an OAuth token or API key because a tool returned it in a response, the secret has moved from a managed credential store into semi-structured agent state.

That state is rarely governed like a secrets manager. It may not have rotation semantics. It may not generate access logs that security teams understand. It may not distinguish between a user preference and sensitive operational material. It may be exported for debugging, embedded into vector stores or copied between environments. Convenience features become quiet persistence layers.

Memory-persistent exfiltration is a useful name for this risk because it does not pretend the risk is exotic. Persistence does not require malware if the platform is designed to remember. It only requires that the wrong thing is remembered in the wrong place.

Defenders should treat agent memory as a security boundary, not a product feature. It needs data classification, expiry, inspection, deletion and secret detection. It also needs provenance. A memory item derived from a public README should not carry the same trust as one derived from a private Slack channel or an external web page. If the agent cannot tell where remembered instructions came from, it cannot safely decide when to apply them.

The AI supply chain includes hidden primitives

The most relevant academic source in the brief is not about OAuth or MCP directly. The arXiv paper Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking introduces SeedHijack, a supply-chain attack against LLM watermarking schemes that depend on a trustworthy pseudo-random number generator. The target is watermark integrity, not credential theft. The lesson still transfers.

SeedHijack attacks an assumption below the visible AI behaviour. Watermarking schemes such as KGW, Unigram and DipMark rely on randomness as a trusted primitive. If that primitive is subverted, the higher-level defence can fail while appearing intact. The model may still run. The watermarking mechanism may still produce output. The integrity break sits in the dependency beneath it.

MCP credential handling has the same structural weakness. The visible agent behaviour is only the top of the stack. Beneath it sit OAuth libraries, token stores, refresh mechanisms, filesystem permissions, tool schemas, memory databases, logging pipelines and network egress controls. A failure in any of these can defeat a security story that looks reasonable at the interface level.

A model can refuse to print a secret while the logging layer records it. An MCP server can advertise a narrow tool while its filesystem handling permits traversal. An OAuth grant can be scoped in theory while a refresh token remains valid after the user has forgotten the integration exists. A watermark can claim provenance while the pseudo-random generator beneath it has been hijacked.

The supply chain is not only code that is imported. It is every trusted primitive that the system assumes will behave honestly.

Why credential handling remains the weakest link

Credential handling keeps failing because it is where usability pressure and security theory collide. Users want tools to connect once and keep working. Vendors want onboarding flows with minimal friction. Security teams want short-lived, scoped, observable credentials with clear revocation. The product usually ships somewhere in the middle, which means a long-lived token with broad enough access to avoid support tickets.

AI orchestration adds another pressure: the tool does not always know in advance what it will need. A coding agent may need repository access, issue tracker access, local file access and terminal access in one task. A business assistant may need email, calendar, document and Slack access to answer a question. This encourages broad grants because narrow grants degrade the demo.

That incentive is dangerous. Broad OAuth scopes are easy to justify when the agent is framed as a trusted assistant. They are harder to justify when the agent is treated as a programmable integration exposed to untrusted input. The second framing is closer to reality.

The problem is not solved by telling users to be careful. Consent screens are not risk assessments. Most users cannot evaluate whether a requested scope is appropriate, whether the vendor stores refresh tokens securely, whether the MCP server isolates tenants, whether tool outputs are redacted from logs or whether memory can retain secrets. Administrators are often not in the loop until after grants have been issued.

Controls that belong outside the model

The practical response is not to ban MCP or OAuth. It is to move credential controls out of the conversational layer and into enforceable infrastructure.

First, OAuth grants for AI tools should be inventoried like privileged SaaS applications: which users authorised which tools, which scopes, when grants were last used, whether refresh tokens exist. Unknown AI OAuth apps should be treated as unmanaged third-party access, not personal preference.

Second, tokens used by agents should be short-lived, audience-restricted and bound to specific tool operations. A token that lets an agent read Slack history should not also be useful for exporting data through another path.

Third, MCP servers need deny-by-default boundaries: explicit filesystem roots, robust path handling, classified document sources, egress policy on network-capable tools and secret-filtering on tool responses before they enter model context, logs or memory.

Fourth, memory should be treated as a governed data store with retention limits, secret scanning, user-visible deletion, provenance tracking and controls that prevent untrusted instructions from becoming durable policy.

Fifth, audit logs should connect prompts, tool calls and credential use. Defenders need to know what instruction caused an access, which tool executed it and where the result went. Without that chain, incident response becomes archaeology.

None of these require a breakthrough in AI safety. They require treating agent platforms as software systems with credentials, state and I/O. That should not be a radical idea.

The trust boundary has already moved

MCP's success would make this problem larger, not smaller. Standardisation reduces integration friction. More integrations mean more tokens, more tool calls and more stored context. Every security property that is vague in the early ecosystem becomes harder to retrofit once the protocol is embedded into developer tools, enterprise assistants and internal automation.

MCP-based orchestration is now crossing the same line. The agent is not outside the supply chain because it speaks in natural language. It is inside the supply chain because it can act.

Credential handling is the place where this becomes concrete. Tokens define what the agent can do after the conversation ends, after the browser tab closes and after the user forgets which consent screen they accepted. If those tokens are broad, persistent and weakly observed, the AI orchestration layer becomes another durable path through the organisation.

The next serious AI supply-chain incident may be described as prompt injection, MCP abuse or OAuth compromise depending on which part of the chain is easiest to name. The harder truth is that these are no longer separate categories. They are one system, and the credential is the part that turns suggestion into authority.