RAGFlow PR #14803 removed a CWE-502 pickle RCE footgun from deserialize_b64

RAGFlow's deserialize_b64 helper defaulted to unrestricted pickle.loads() on decoded database data. I fixed it in PR #14803 by removing the flag-controlled unsafe branch and making the existing RestrictedUnpickler the only deserialisation path. The change was merged on 15 May 2026.

This is a bounded finding, not an internet-facing exploit chain. The relevant SerializedField path exists, but no model in the current tree instantiates it with the default PICKLE serialisation type. That makes the bug latent rather than directly reachable through a RAGFlow HTTP endpoint today. The reason it still mattered is that the helper was insecure by default in exactly the place future database fields would reuse.

The vulnerable default

The vulnerable function lived in api/utils/configs.py. It accepts a base64 string or bytes, decodes it and then deserialises the resulting payload. Before the fix, the hunk beginning at line 54 selected between a restricted loader and raw pickle based on a configuration flag:

use_deserialize_safe_module = get_base_config(
    'use_deserialize_safe_module', False)
if use_deserialize_safe_module:
    return restricted_loads(src)
return pickle.loads(src)

The flag name sounds reassuring. The default value is the problem. get_base_config('use_deserialize_safe_module', False) means that, unless an operator explicitly enabled the flag, RAGFlow used bare pickle.loads().

A repository-wide check found no configuration file setting use_deserialize_safe_module. In practice, the default path was the unsafe path.

Python's own documentation is blunt about this boundary: the pickle module is not secure and should not be used to unpickle data from an untrusted or unauthenticated source. That warning exists because pickle is a bytecode-like object construction format. A crafted payload can resolve globals and invoke callables during deserialisation through methods such as __reduce__. Treating pickle as a data format is how data parsing turns into code execution.

The database read path

The relevant caller is SerializedField.python_value in api/db/db_models.py. Peewee invokes python_value when a value is read from a database column and converted back into a Python object. In RAGFlow's case, that creates this path:

MySQL row value
  -> SerializedField.python_value
  -> deserialize_b64
  -> base64.b64decode
  -> pickle.loads by default

An attacker would need a way to influence the bytes stored in a SerializedField(serialized_type=PICKLE) column. That could come from another SQL injection, compromised database credentials, an untrusted backup restore or a compromised replication peer. Those are meaningful preconditions. They do not make the issue irrelevant.

Database write access is not the same thing as application host code execution. A malicious pickle payload shifts the boundary from data tampering to execution in the RAGFlow process, with access to in-process secrets, local files, service credentials and internal network reachability. In many deployments, that is a material escalation.

The current codebase narrows the immediate exposure. In-tree models use JsonSerializedField in practice rather than SerializedField with the default PICKLE type. That is why this case study is about an insecure default and a future sharp edge, not a confirmed endpoint exploit. Security-sensitive helper functions still need safe defaults because new model fields tend to copy existing project conventions. The bug would not announce itself when a future developer added a pickled field. It would just inherit RCE-on-read semantics.

The fix

The fix is deliberately small:

-    use_deserialize_safe_module = get_base_config(
-        'use_deserialize_safe_module', False)
-    if use_deserialize_safe_module:
-        return restricted_loads(src)
-    return pickle.loads(src)
+    return restricted_loads(src)

The now-unused get_base_config import is removed as well. The resulting function still base64-decodes the input, but every decoded payload goes through restricted_loads.

restricted_loads already existed in the same file. It wraps a RestrictedUnpickler that overrides class resolution and limits permitted modules to an allow-list containing numpy and rag_flow. That is a narrower trust boundary than arbitrary global resolution through pickle.loads().

Local validation used two cases. A malicious pickle whose __reduce__ resolved to posix.system('id') executed before the fix. After the fix, the same payload raised:

UnpicklingError: global 'posix.system' is forbidden

A benign numpy.ndarray round-tripped through serialize_b64 and deserialize_b64 with values preserved. That mattered because a security fix that breaks legitimate serialised array storage would not be a clean hardening change. This one preserved the intended safe path and removed the unsafe one.

The remaining allow-list question

This PR does not claim that restricted_loads is a perfect deserialisation sandbox. RAGFlow's existing security notes identify a concern around the broad numpy allow-list, including paths such as numpy.f2py.diagnose.run_command. That is a separate hardening problem: a robust restricted unpickler should usually allow specific classes or symbols, not entire modules.

The important distinction is relative risk. The previous default allowed every importable global that pickle could resolve. The new default allows only the project's restricted path. Tightening that allow-list further would be good defence-in-depth, but it is not a reason to leave raw pickle.loads() as the default.

Security fixes often have to be sequenced. First remove the unconditional execution primitive. Then narrow the allowed surface that remains.

The AI framework pattern

Unsafe deserialisation is an old Python bug class, but it keeps reappearing in AI infrastructure because these systems move rich objects between databases, workers, vector pipelines, notebooks, agents and web services. The pressure is towards convenience: serialise the object, store it, reload it later. Pickle makes that easy. It also makes the deserialiser an execution boundary.

Recent CVEs show the same pattern outside RAGFlow. NVD describes CVE-2025-3108 as a critical LlamaIndex JsonPickleSerializer flaw where an insecure fallback to pickle.loads() could lead to remote code execution. NVD also describes CVE-2025-62373 in Pipecat's LivekitFrameSerializer, where WebSocket client data reached pickle.loads() in an optional LiveKit serializer. The products differ, but the shape is familiar: a framework component treats serialised Python objects as transport data, then discovers that the transport format can execute code.

This sits alongside the injection and file access issues I have been finding across AI projects. In LightRAG's Memgraph backend, LLM and API-controlled entity types reached Cypher syntax. In full-stack-ai-agent-template's webhook service, user-controlled URLs became server-side requests with response exfiltration. In NVIDIA's RAG Blueprint MCP server, client-supplied paths became local file reads and RAG ingestion.

The common thread is not novelty in the vulnerability class. It is the placement of old bug classes inside AI control planes. RAG and agent frameworks are plumbing layers. They sit between user input, model output, databases, filesystem access, queues and internal services. When a plumbing layer gets a trust boundary wrong, the failure is rarely confined to a single feature.

What to take from the fix

For RAGFlow users, the practical advice is simple: use a version containing PR #14803 or apply the equivalent one-line change. If any local fork has introduced SerializedField(serialized_type=PICKLE), treat existing database contents as part of the attack surface and review who can write to those columns.

For maintainers of Python AI infrastructure, the rule is stricter. Do not put pickle.loads() behind a configuration flag that defaults to unsafe. If pickle is unavoidable, make the restricted path mandatory, authenticate the data before loading it and constrain the accepted types as tightly as possible. Better still, use a data format that does not execute code as part of parsing.

The uncomfortable part is that this fix deleted more code than it added. The vulnerable behaviour was not an absence of engineering. It was a safety mechanism made optional, then defaulted off. That is the kind of bug that passes code review because all the right words are present in the file: safe module, restricted loader, configuration flag. The exploit lives in which branch runs when nobody touches the knob.