Langflow BaseFileComponent Arbitrary File Read Leading to Authentication Bypass and RCE

Vulnerability Description

This is a symlink injection vulnerability in the _unpack_bundle() function within langflow/src/lfx/src/lfx/base/data/base_file.py. The root cause is insufficient validation of tar archive contents prior to extraction. When processing tar files, the code does not verify that extracted items are regular files or reject symbolic links pointing outside intended extraction directories. This permits an attacker to craft a malicious tar containing symlinks targeting arbitrary filesystem paths (e.g., /etc/passwd, application secrets, configuration files). The vulnerability is compounded by the fact that file components accept user-controlled input in RAG workflows, making it directly exploitable in multi-tenant or open chatbot scenarios.

Proof-of-Concept Significance

The disclosed PoC demonstrates a complete attack chain rather than isolated file read capability: (1) extract JWT signing secrets via symlink-based path traversal, (2) forge valid JWT tokens for any user ID, (3) bypass authentication entirely, (4) execute arbitrary Python via the Python Interpreter node. This proves the vulnerability enables unauthenticated remote code execution, not merely information disclosure. The attack requires only the ability to upload or control files processed by a BaseFileComponent, a realistic precondition in RAG chatbots where users upload documents. The PoC was validated on commit 2d67402b, confirming active exploitability in recent versions.

Detection Guidance

Defenders should monitor for: (1) Filesystem audit logs showing symlinks being followed into sensitive directories during Langflow tar extraction (use auditctl -w /path/to/extraction/dir -p wa); (2) Application logs for tar extraction errors or warnings related to symlinks or permission denials; (3) File integrity monitoring (AIDE, Tripwire, osquery) alerting on unauthorized reads of secret_key or JWT configuration files; (4) Authentication logs for JWT tokens issued to unexpected user IDs or service accounts; (5) Yara rule: detect tar files with symlink entries in headers (magic bytes 75 73 74 61 72 followed by link indicators in file type field); (6) WAF/IPS signatures for Python Interpreter node API calls with suspicious code payloads if exposed.

Mitigation Steps

Immediate actions: (1) Patch: upgrade Langflow to a patched version that validates tar contents and rejects symlinks or extracts only to isolated temporary directories; (2) Workaround (until patching): disable or remove BaseFileComponent-based nodes from production flows; restrict file upload sources to trusted, pre-scanned repositories; run Langflow in a sandboxed container with minimal filesystem permissions and no access to sensitive files; (3) Configuration hardening: ensure secret_key files have restrictive permissions (mode 600), not readable by the Langflow process user; (4) Defense-in-depth: implement filesystem MAC (AppArmor, SELinux) policies limiting Langflow's read access to intended data directories only; rotate all JWT secrets immediately post-patch; (5) Isolation: run Langflow services in containers with read-only root filesystems and a dedicated non-privileged user.

Risk Assessment

This vulnerability presents extreme risk in production RAG deployments. Likelihood of exploitation in the wild is high because: (1) exploitation requires no special tools or deep technical knowledge—crafting a tar with symlinks is trivial; (2) the attack surface is broad—any organization allowing user document uploads; (3) the impact is critical—direct RCE with application privileges. Threat actor interest is high; this enables supply-chain attacks (poisoned documents in shared repositories), insider threats, and lateral movement in enterprise environments. Organizations running Langflow with internet-facing file upload endpoints or untrusted document sources face immediate compromise risk. The attack is silent and difficult to detect without comprehensive logging. Estimated CVSS: 9.8+ (Critical).