Ollama Memory Disclosure via Out-of-Bounds Read Affects 300k+ Deployments

Ollama, a widely deployed local large language model inference engine, contains an out-of-bounds read vulnerability that permits remote, unauthenticated attackers to read arbitrary memory from the server process. Classified as CVE-2026-7482 with a CVSS score of 9.1, this flaw was discovered and named 'Bleeding Llama' by Cyera researchers. The vulnerability's critical severity reflects both the ease of exploitation (no authentication required) and the sensitivity of exposed data: process memory may contain model weights, user inputs, API keys, and other secrets.

Out-of-bounds reads of this nature typically arise from insufficient bounds checking on buffer operations or array indexing. In the context of Ollama's request handling, an attacker likely crafts a malformed input that causes the server to read beyond allocated memory boundaries, returning data to the caller. The fact that this is remotely exploitable without credentials suggests the flaw exists in early request parsing or validation stages, prior to any authentication checkpoint.

The scale of exposure is substantial. Ollama has gained significant adoption among researchers, enterprises experimenting with LLMs, and organisations running local inference to avoid cloud vendor costs or preserve data privacy. An estimated 300,000+ servers globally run Ollama, many in development, staging, and production environments. Given that Ollama often runs on private networks or exposed to limited audiences, defenders may incorrectly assume implicit security; this flaw invalidates such assumptions.

Organisations currently deploying Ollama should prioritise patching immediately and audit network access controls. Instances exposed to untrusted networks face active risk. Additionally, organisations should assume that any secrets, model data, or user inputs resident in memory during the window of exploitation may be compromised. A full incident response assessment is warranted for instances that ran vulnerable versions for extended periods.

This vulnerability highlights a broader pattern in the AI tooling ecosystem: security considerations often trail adoption velocity. Tools optimised for developer convenience and rapid experimentation frequently ship without hardened authentication, input validation, or memory safety guarantees. As LLM inference becomes operational infrastructure rather than research playground, this gap will continue to surface. Cyera's disclosure and the assignment of a critical CVSS score signals that the industry is beginning to treat such tools with appropriate scrutiny.