MCP-for-Stata Command Injection via Unsanitized log_file_name Parameter

Vulnerability Description

This is a classic command injection vulnerability (CWE-77, CWE-78) arising from unsafe string interpolation of user-controlled input into shell/command contexts. The root cause is the direct embedding of the log_file_name parameter into f-string formatted Stata command strings without sanitization or escaping. The GuardValidator security mechanism only validates do-file content, creating a security boundary bypass. Impact includes arbitrary Stata command execution (shell commands, file deletion, Python code execution), potential remote code execution, data exfiltration, and system compromise depending on Stata process privileges.

PoC Significance

The advisory demonstrates that input validation was incomplete—security controls were added for one parameter class (do-file content) but not for related parameters (log file names). This PoC proves that path construction functions (generate_log_file) are reachable from user-facing APIs without intermediate validation. The vulnerability is highly reliable since it exploits quote-escaping in double-quoted Stata strings and command separators (newlines, semicolons). Preconditions are minimal: any caller with access to stata_do API or CLI tool can trigger exploitation.

Detection Guidance

Log indicators: Monitor for Stata process invocations with unusual log file parameters containing shell metacharacters (quotes, backticks, $(), newlines, semicolons), path traversal sequences (../), or commands like shell, python, erase. Filesystem indicators: Watch for log files written to unexpected directories outside the intended log directory; unexpected file creation/deletion patterns. Process monitoring: Alert on Stata spawning child processes (shells, interpreters) unexpectedly. Input-level detection: Scan MCP tool invocations and CLI arguments for payloads containing Stata command keywords in log_file_name fields.

Mitigation Steps

Input validation: Implement strict allowlist-based validation for log_file_name—permit only alphanumeric characters, underscores, hyphens; reject paths with traversal sequences, quotes, or special characters.
Parameterization: Refactor code to pass log file paths as separate parameters to Stata rather than string interpolation; use Stata's native logging APIs if available.
Escaping: If string interpolation is unavoidable, properly escape double quotes and backslashes in the log file path before embedding in Stata commands.
Path canonicalization: Resolve all paths to absolute canonical form and validate they fall within an expected directory boundary.
Apply patch: Update MCP-for-Stata to a patched version that implements one or more mitigations above.

Risk Assessment

Likelihood of exploitation: High—the vulnerability requires minimal preconditions and no authentication bypass; any user with CLI or API access can exploit it. Threat actor interest: High—command injection in scientific computing pipelines (common in research, finance, data science) is an attractive lateral movement or data exfiltration vector. Severity in the wild: Critical for environments where MCP-for-Stata processes untrusted user input or third-party do-files; medium for isolated research environments with trusted operators. Organizations using this tool should prioritize patching immediately.