mcp-searxng PR #71 fixed a CWE-1333 ReDoS in section extraction

mcp-searxng accepted an attacker-controlled section parameter from the web_url_read MCP tool and interpolated it directly into a dynamically constructed regular expression. That made PR #71 a small fix for a large failure mode: a malicious client could turn section extraction into catastrophic backtracking and block the Node.js event loop.

I identified the vulnerability, submitted the fix and added a focused unit test after review. The fix was merged on 28 April 2026.

The vulnerable code path

The vulnerable function was extractSection() in src/url-reader.ts. Its job is simple: after a URL is fetched and converted to Markdown, the caller can ask for a named section. The function splits the Markdown into lines, finds a heading matching the requested section name and returns the content under that heading.

Before the fix, line 93 built the matcher like this:

const sectionRegex = new RegExp(`^#{1,6}\s*.*${sectionHeading}.*$`, 'i');

The problem is the middle of that template string. sectionHeading was not a trusted pattern. It came from the MCP request.

The call chain was:

CallToolRequestSchema handler in index.ts for web_url_read
  -> fetchAndConvertToMarkdown()
  -> applyPaginationOptions()
  -> extractSection()

The input validation step only checked that args.section was a string. It did not escape regex metacharacters, enforce a safe character set, impose a useful length limit or apply a regex timeout. Once it reached extractSection(), the user-supplied value became regex syntax.

That is CWE-1333: inefficient regular expression complexity. The weakness is not merely that a user can change match behaviour. It is that a user can supply a pathological expression whose matching time grows exponentially against chosen input.

Why the bug was exploitable

The attacker controlled both sides of the match.

The url argument could point to an attacker-controlled page. That page could contain a Markdown heading such as:

# aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

The section argument could then be set to a ReDoS payload such as:

(a+)+b

When interpolated into the original pattern, the resulting regex asked the JavaScript engine to find a heading containing a nested quantified expression that almost matches but fails at the end. The failure is the expensive part. Each additional a gives the engine more ways to partition the input before it finally concludes that the trailing b is not present.

Measured locally during the audit, 28 a characters produced roughly two seconds of processing. The growth rate was approximately $2\times$ per additional character. That puts 35 characters in the range of minutes and 50 characters in the range of months of theoretical backtracking work. In practice, the service is already unavailable long before the upper bound matters because Node.js runs JavaScript on a single event loop.

This is a particularly poor fit for an MCP server. MCP tools are often wired into agents and editors as local or semi-local infrastructure. In STDIO mode, any connected MCP client can invoke the tool. In HTTP mode, authentication was optional through MCP_HTTP_HARDEN=true, not mandatory by default. A denial of service does not need shell access when the tool interface already supplies both the document source and the regex fragment.

The fix

PR #71 adds a small helper in src/url-reader.ts:

function escapeRegExp(str: string): string {
  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

The section matcher now escapes the user-controlled section name before constructing the regex:

const sectionRegex = new RegExp(`^#{1,6}\\s*.*${escapeRegExp(sectionHeading)}.*$`, 'i');

That changes the meaning of inputs like (a+)+b. They are no longer parsed as grouping and quantifiers. They are treated as literal characters in a heading search.

The fix is deliberately narrow. It does not redesign pagination, replace the regex engine or introduce a dependency. It neutralises the dangerous boundary: user input crossing into regex syntax. MDN documents this escaping pattern for safely constructing regular expressions from strings.

A secondary functional issue was also corrected during review. In a JavaScript template literal, \s does not produce the regex whitespace class unless the backslash is escaped. The matcher now uses \\s*, so optional whitespace after Markdown heading hashes is handled as intended.

The test that matters

The added unit test is not a generic regression count. It captures the security property that had to be true after the fix: section names with regex metacharacters are literal strings.

The test page contains two headings:

<h2>API (v1.0+) reference?</h2>
<h2>API v100 referencex</h2>

The request asks for:

{ section: 'API (v1.0+) reference?' }

Before escaping, that string is a regex-shaped input. Parentheses group, . matches any character, + repeats and ? makes the previous token optional. A near miss such as API v100 referencex can be selected by accident because the section name is being interpreted as a pattern rather than text.

After the fix, the test asserts three things:

The literal heading API (v1.0+) reference? is included.
The matching section body is included.
The regex-like near miss is excluded.

That is the right test for this bug. It proves semantic preservation for legitimate headings and closes the injection primitive at the same time. The full suite passed with 156 of 156 tests.

Prior art and the AI tooling pattern

ReDoS from unsafe regular expression construction is an old class of bug. MITRE tracks it as CWE-1333. NVD records the same broad failure mode in widely used JavaScript packages, including CVE-2022-25883 in semver and CVE-2021-27292 in ua-parser-js.

The MCP angle changes the operational context, not the bug class. A web application ReDoS usually depends on an HTTP endpoint. An MCP ReDoS may sit behind an editor, an agent runtime or a local tool bridge that developers implicitly trust because it runs near their workflow rather than on a public route.

That trust boundary has been showing cracks. In the blog's MCP attack surface research, the recurring theme was that MCP standardises tool access faster than the ecosystem standardises safety checks. The MCPHub default password case showed the management-plane version of the same problem: familiar web security mistakes reappearing inside AI infrastructure with new blast radius. mcp-searxng's regex injection is another entry in that pattern. The vulnerability is not exotic. The deployment context makes it easy to underestimate.

Regex construction is especially easy to get wrong in tool servers because the feature often starts life as convenience code. A user asks for a section, so the implementation builds a matcher. A string becomes a pattern because it is the shortest path between intent and working behaviour. The security boundary is invisible until someone supplies characters that have meaning to the regex engine.

What to check in similar MCP servers

This bug has a simple review heuristic: search for new RegExp() and ask whether any part of the pattern came from a tool argument, a URL parameter, model output or document content. If the answer is yes, the next question is whether that value is meant to be syntax or data.

If it is data, escape it. If only a small set of headings or keys is expected, validate against that set. If the regex must remain user-programmable, then it needs explicit limits and operational isolation because it is no longer a simple search field.

mcp-searxng needed the first option. The section parameter is a literal heading selector, not a regex feature. The safest fix was to make the code reflect that.

The uncomfortable part is how small the vulnerable line was. One interpolation in one helper function was enough to let a client pin the server. MCP did not invent that mistake, but it gives old mistakes a new place to hide: inside the tools that agents are encouraged to trust automatically.