maboloshi/github-chinese PR #692 fixed DOM XSS in translation results

maboloshi/github-chinese inserted third-party translation API responses into GitHub pages as HTML. I fixed the issue in PR #692, which was applied on 17 May 2026. The vulnerable code lived in both the simplified Chinese userscript and the traditional Chinese userscript. In both cases, text returned by the iflyrec translation service was interpolated into an HTML string and passed to insertAdjacentHTML() on github.com.

This is a DOM-based cross-site scripting vulnerability, mapped to CWE-79. The interesting part is not the novelty of the bug class. It is the trust boundary. A userscript running on GitHub treated a remote translation response as safe page markup, then inserted it into a high-value authenticated browser session.

The vulnerable code path

The simplified Chinese script had the vulnerable sink around line 396 of main(greasyfork).user.js:

translateDescText(desc, text => {
    button.style.display = "none";
    const translationHTML = `<span style='font-size: small'>由 <a target='_blank' style='color:rgb(27, 149, 224);' href='...'>讯飞听见</a> 翻译👇</span><br/>${text}`;
    element.insertAdjacentHTML('afterend', translationHTML);
});

The traditional Chinese script had the same pattern around line 380 of main_zh-TW.user.js:

transDescText(descText, translatedText => {
    button.style.display = "none";
    const translatedHTML = `<span style='font-size: small'>由 <a target='_blank' style='color:rgb(27, 149, 224);' href='...'>訊飛聽見</a> 翻譯👇</span><br/>${translatedText}`;
    element.insertAdjacentHTML('afterend', translatedHTML);
});

The first part of each string is trusted static markup: a small attribution label and a link to the translation provider. The final interpolation is not trusted. text and translatedText come from a third-party HTTP response. They are not sanitised before being parsed as HTML by the browser.

That last detail is the entire vulnerability. insertAdjacentHTML() does not insert text. It parses a string as markup at the specified DOM position. If the string contains an element with an executable event handler, the browser will create that element and run the handler when the event fires. OWASP's DOM XSS guidance describes this class directly: attacker-controlled data reaches a DOM sink that interprets it as code or markup.

What made it exploitable

The userscript runs on GitHub pages and, according to the review, uses @grant none. That matters because @grant none means the script operates directly in the page context rather than inside a userscript-manager sandbox. A payload that executes there is not confined to an isolated extension world. It is JavaScript running where the GitHub page is running.

The translation API is accessed over HTTPS, so ordinary network interception is not the primary scenario. The realistic scenarios are API compromise, DNS hijack, a rogue CDN edge or any upstream behaviour that returns HTML where the caller expected plain text. The code did not need the translation service to be malicious by design. It only needed the response to contain markup.

A simulated malicious response is enough to show the impact:

<img src=x onerror="alert(document.cookie)">

Before the fix, clicking the translation button would append that payload after the target element. The <img> fails to load, the onerror handler fires and arbitrary JavaScript executes in the GitHub page context. A demonstration alert is not the real concern. A real payload would target credentials available to JavaScript, CSRF-protected actions exposed through the page or authenticated GitHub API operations the browser can already perform.

The attack requires user interaction because the translation button has to be clicked. That lowers exploitability, but it does not make the issue theoretical. The feature is explicitly designed to be clicked on GitHub issues, pull requests and README content. Users who install a translation userscript are likely to use the translation button when they encounter text they cannot read.

The fix in PR #692

The fix separates trusted markup from untrusted text. In main(greasyfork).user.js, the patched code creates a container, sets only the static attribution label as HTML, then appends the translation result as a text node:

const resultContainer = document.createElement('span');
resultContainer.innerHTML = `<span style='font-size: small'>由 <a target='_blank' style='color:rgb(27, 149, 224);' href='...'>讯飞听见</a> 翻译👇</span><br/>`;
resultContainer.appendChild(document.createTextNode(text));
element.after(resultContainer);

The same change was applied to main_zh-TW.user.js:

const resultContainer = document.createElement('span');
resultContainer.innerHTML = `<span style='font-size: small'>由 <a target='_blank' style='color:rgb(27, 149, 224);' href='...'>訊飛聽見</a> 翻譯👇</span><br/>`;
resultContainer.appendChild(document.createTextNode(translatedText));
element.after(resultContainer);

This keeps the existing layout and attribution link intact. The only behavioural change is how the remote response enters the DOM. document.createTextNode() treats its argument as text. If the API returns <img src=x onerror=alert(1)>, the browser renders those characters visibly. It does not create an image element and it does not execute the handler.

There is still an innerHTML assignment in the patched code, but it is limited to a hard-coded string controlled by the project. That is a reasonable boundary. The dangerous pattern was not HTML construction in the abstract. The dangerous pattern was mixing static HTML and remote data into one string, then handing the whole string to an HTML parser.

Why this matters on GitHub pages

A DOM XSS in a random static site is not the same as a DOM XSS in a userscript that runs on GitHub. The browser context is the value. GitHub sessions carry repository access, issue triage permissions, pull request privileges and organisation context. A script executing in that page can observe and interact with whatever the authenticated user can access through the frontend.

Userscripts also occupy an awkward security position. They are installed deliberately, so browsers and extension managers tend to treat them as user-approved customisation. But once installed, the script becomes part of the user's browsing environment. It can blur the line between the site's own JavaScript, the extension's code and remote services that the script calls. In this case, a translation provider response effectively became DOM input for GitHub.

That is why the safe default should be text insertion unless markup is explicitly required. Translation output is plain text. It does not need to carry event handlers, elements or scripts. If formatting is required later, it should pass through an allowlist-based sanitizer before reaching an HTML sink.

The AI-adjacent pattern

This bug is not an LLM vulnerability, but it fits a pattern that keeps appearing around AI and language tooling: generated or transformed text is treated as safer than user input because it came from a service. Translation APIs, LLMs, RAG extractors and summarisation systems all produce text that looks like application output. That does not make it trusted.

The same failure mode appeared in a different form in LightRAG's Memgraph backend, where extracted entity types could become Cypher labels without adequate sanitisation. In prompt injection against AI coding assistants, the problem is more direct: text from files and tools becomes instruction material for an agent. In both cases, the system gives semantic authority to text that crossed a trust boundary.

The github-chinese issue is a smaller case study, but the boundary is clean. Remote text entered the application from a third-party service. The code then chose a parsing API rather than a text API. That one choice converted data into markup.

This is the part developers should take from the fix. The question is not whether the upstream service is reputable. The question is what parser sees its output next. If the next parser is an HTML parser, a shell, a SQL engine, a Cypher engine or an LLM agent with tools, then plain text has stopped being plain text.

What to check in similar code

The audit pattern is straightforward. Search for DOM sinks such as innerHTML, outerHTML, insertAdjacentHTML() and document.write(). For each one, trace whether any part of the string comes from a network response, page content, local storage, URL parameters or generated text. If it does, either replace the sink with textContent, createTextNode() or DOM construction APIs, or add a sanitizer with an explicit allowlist.

For userscripts, also check the execution context. @grant none is convenient because it lets the script interact naturally with the page, but it also means XSS in the userscript is XSS in the page's JavaScript context. On high-value domains, that distinction matters.

PR #692 is a small patch: create a container, keep the attribution markup static and append the translation response as a text node. The security change is larger than the diff because it restores the missing boundary. Translation output can still be useful, but it no longer gets to become part of GitHub's DOM grammar.

The uncomfortable lesson is that text transformation tools sit closer to code execution than they appear. The output may be prose, but the sink decides whether prose stays prose.