CodeGraphContext PR #882 rejects write Cypher on /api/graph

CodeGraphContext's visualisation server exposed arbitrary write Cypher through GET /api/graph. The route accepted a cypher_query parameter, passed it straight to Neo4j with session.run(cypher_query) and relied on callers to behave. PR #882 closes that gap by enforcing the same read-only contract that already existed in the MCP query tool.

I identified the vulnerability, submitted the fix and validated it against both destructive and benign queries. The change was merged on 7 May 2026.

The vulnerable path

CodeGraphContext is an MCP server and CLI tool that indexes local source code into a graph database so AI assistants can query project structure. That graph is exposed through two main surfaces: the MCP tool path and the visualisation HTTP server.

The MCP path already had a guard. Its execute_cypher_query handler blocked write-oriented Cypher keywords before passing a query to the database. The visualisation server did not.

In src/codegraphcontext/viz/server.py, the get_graph() route accepts cypher_query from GET /api/graph. Before the patch, the relevant control flow was direct:

@app.get("/api/graph")
async def get_graph(repo_path: Optional[str] = None, cypher_query: Optional[str] = None):
    ...
    with db_manager.get_driver().session() as session:
        if cypher_query:
            print(f"DEBUG: Executing custom query: {cypher_query}", flush=True)
            result = session.run(cypher_query)

The data flow is the vulnerability:

A client supplies cypher_query in the query string.
FastAPI binds it to the route argument.
The value reaches session.run() unchanged.
Neo4j executes it with the application's database privileges.

That is CWE-943: special elements in query logic were not neutralised before execution. In practice, it meant the visualiser could be asked to run CREATE, MERGE, DELETE, SET, REMOVE, DROP or CALL apoc.* operations even though the endpoint was supposed to be a graph viewer.

Why localhost still mattered

The visualiser is not normally deployed as a public internet service. That does not make the bug academic.

The normal usage model is a developer running CodeGraphContext locally with the visualiser listening on 127.0.0.1:8000. The server also installs permissive CORS with allow_origins=["*"]. Under that combination, any web page the developer visits can send a browser request to the local visualiser. The browser is allowed to make the cross-origin request and the attacker's JavaScript can observe the response.

The preconditions are ordinary:

Condition	Why it is realistic
Visualiser running locally	This is the tool's normal development workflow
Browser on the same machine	The developer is using the same workstation
Wildcard CORS	The server explicitly permits cross-origin access
Writable Neo4j user	The default graph database user is not read-only

Before the patch, a request equivalent to this could create a node in the local graph:

GET /api/graph?cypher_query=CREATE%20(n:Pwn)%20RETURN%20n HTTP/1.1
Host: 127.0.0.1:8000

The same path could delete or mutate the indexed code graph. That is not remote code execution, but it is still a meaningful compromise of the tool's data model. For an assistant that uses graph context to reason about code, corrupting the graph is also a way to corrupt the assistant's view of the repository.

What PR #882 changes

The fix adds a small read-only validator near the top of viz/server.py, at lines 32 to 48 in the PR diff:

_FORBIDDEN_CYPHER_KEYWORDS = (
    'CREATE', 'MERGE', 'DELETE', 'SET', 'REMOVE', 'DROP', 'CALL apoc'
)
_STRING_LITERAL_RE = re.compile(r'''"(?:\\.|[^"\\])*"|\'(?:\\.|[^\'\\])*\'''')
 
def _is_read_only_cypher(query: str) -> bool:
    """Return True if *query* contains no write keywords (outside string literals)."""
    stripped = _STRING_LITERAL_RE.sub('', query)
    for keyword in _FORBIDDEN_CYPHER_KEYWORDS:
        if re.search(r'\b' + keyword + r'\b', stripped, re.IGNORECASE):
            return False
    return True

The helper strips string literals before looking for forbidden keywords. That matters because a query such as this should remain valid:

MATCH (n) WHERE n.name = "CREATE" RETURN n

The word CREATE appears in the query text, but it is data rather than syntax. Removing quoted strings before keyword matching avoids that false positive and matches the behaviour already used by the MCP handler.

The enforcement point is added immediately before session.run() at lines 94 to 102 of the diff:

if cypher_query:
    if not _is_read_only_cypher(cypher_query):
        raise HTTPException(
            status_code=400,
            detail=(
                "This endpoint only supports read-only Cypher queries. "
                "Prohibited keywords like CREATE, MERGE, DELETE, SET, REMOVE, "
                "DROP, or CALL apoc are not allowed."
            ),
        )
    result = session.run(cypher_query)

A final small change at line 265 preserves intentional HTTPException responses. Without that, the route's broad except Exception block would catch the deliberate 400 response and convert it into a generic 500. The fix therefore changes both the security decision and the HTTP behaviour around that decision.

The patch is deliberately narrow: one file, about 30 added lines. It does not redesign the visualiser, change the database driver or remove custom queries. It makes the HTTP endpoint enforce the contract it already claimed to have.

What testing showed

The regression test was behavioural rather than theoretical.

Before the patch, a request containing CREATE (n:Pwn) RETURN n reached the driver and wrote a node to Neo4j. After the patch, the same request returned HTTP 400 with the read-only error message and did not reach session.run().

Read queries still worked. MATCH (n) RETURN n LIMIT 5 continued to return graph results. The string-literal case was also checked with MATCH (n) WHERE n.name = "CREATE" RETURN n, which was accepted because the forbidden keyword was inside a literal rather than in Cypher syntax.

That last check is small but important. Security patches that break ordinary read queries tend to get reverted. The validator needed to block writes without making the visualiser unpleasant to use.

The broader pattern in AI graph tooling

Cypher injection is not new. NVD records Cisco SD-WAN vManage issues such as CVE-2021-1349 where crafted HTTP requests to a management interface could trigger Cypher query language injection. The pattern has now moved into AI infrastructure because graph databases are a convenient way to store code relationships, entities, memories and retrieval context.

The newer examples look different at the edge but similar at the sink. NVD describes CVE-2026-32247 in Graphiti as a Cypher injection in an AI agent temporal context graph, where attacker-controlled labels reached Cypher construction for Neo4j, FalkorDB and Neptune backends. The interesting part is the route into the bug: in MCP deployments, prompt injection could induce an LLM client to call a tool with attacker-controlled entity_types, which then became graph query structure.

I have seen the same shape in other AI projects. In LightRAG's Memgraph backend, an entity_type value was interpolated into a Cypher label. In full-stack-ai-agent-template, webhook URLs became server-side requests to attacker-chosen destinations. Different CWE numbers, same structural failure: an AI-facing framework takes flexible input, converts it into a privileged operation and forgets that the caller may be hostile or may be an LLM acting on hostile instructions.

CodeGraphContext's bug is simpler than Graphiti's label injection and less exposed than a public API server. Its importance is the trust boundary. The visualiser looked like a local developer convenience, but wildcard CORS turned it into a browser-reachable API. The database looked like an internal implementation detail, but custom Cypher made it a user-controlled query engine.

What remains to harden

PR #882 removes the most damaging class of impact by rejecting write Cypher. It does not make the visualisation server a hardened remote interface.

Three follow-ups are worth treating separately:

Follow-up	Reason
Tighten wildcard CORS	The root drive-by condition is any site being able to call the local visualiser
Use driver-level read transactions	`session.execute_read(...)` is a stronger boundary than keyword matching
Share the validator with the MCP handler	Copying the blocklist by convention creates drift risk

The second point is the most important long term. Keyword blocklists are useful guardrails, but they are not the same as a database-enforced read-only execution path. If the driver or database can enforce read semantics, the application should use that as the final boundary.

The uncomfortable lesson is that localhost is no longer a quiet place. Developer tools increasingly expose local HTTP servers to browsers, editors and AI agents. Once those tools add permissive CORS and privileged backends, the difference between "local convenience endpoint" and "attack surface" is one visited web page.