CERT/CC published VU#777338 on May 18, warning that SGLang contains three newly disclosed vulnerabilities affecting deployments that serve large language models and multimodal AI models. Two issues can lead to remote code execution, and one can allow arbitrary file writes through path traversal. As of CERT/CC’s publication, no official patch was available.
That matters because SGLang is not a toy component. It is an open-source serving framework used to expose LLM and multimodal model workloads behind OpenAI-compatible APIs. In plain English: this is the layer that can sit between users, agents, applications, GPUs, model weights, data pipelines, and cloud credentials. If that layer is reachable from the wrong network, a model-serving vulnerability can become a server compromise.
What was disclosed
The vulnerabilities are tracked as CVE-2026-7301, CVE-2026-7302, and CVE-2026-7304. CERT/CC says exploitation depends on deployment conditions such as multimodal generation being enabled, network access to the SGLang service, or use of the custom logit processor option.
The technical theme is familiar: unsafe deserialization and insufficient path handling. According to the researcher write-up from Antiproof, affected paths include cases where serialized Python objects may be loaded from untrusted input and where uploaded filenames can traverse outside the intended upload directory. Those are old vulnerability classes showing up in a very modern AI infrastructure stack.
Why SMBs and government contractors should care
Small teams are adopting AI tooling quickly. A proof-of-concept LLM server can become a production dependency before anyone formally inventories it. For government contractors, that creates a serious control problem: the same host that serves an internal AI workflow may also have access to source code, project documents, cloud tokens, data stores, or customer information.
The risk is not just “someone can query the model.” The real concern is that an exposed inference service can become a beachhead. Once an attacker reaches the host, they may be able to inspect environment variables, read mounted volumes, tamper with outputs, steal credentials, or pivot into adjacent systems.
Defensive takeaways
- Find SGLang first. Search container images, startup scripts, notebooks, internal AI lab servers, and cloud workloads for SGLang usage. Do not assume it only exists in formal production.
- Do not expose inference services directly to the internet. Put them behind private networking, VPN/ZTNA, authenticated gateways, and explicit allowlists.
- Restrict multimodal and experimental features. If multimodal generation or custom logit processing is not required, disable it until a patched build and safer configuration guidance are available.
- Run AI services with least privilege. Use non-root containers, read-only filesystems where possible, scoped service accounts, and separate credentials from model-serving hosts.
- Segment GPU and model-serving infrastructure. Treat AI runtime nodes like high-value application servers, not developer sandboxes.
- Monitor for suspicious uploads and process behavior. Watch for unexpected file writes, shell execution, outbound connections, and changes to model-serving configuration.
Bulwark Black assessment
This is another example of AI infrastructure inheriting classic web and application security problems while adding new operational pressure. The AI stack is moving fast, but the defensive answer is not exotic: inventory it, isolate it, minimize privileges, and keep untrusted users away from raw service interfaces.
If your organization is experimenting with self-hosted LLMs, now is the time to build a simple AI asset register. Track what is running, who owns it, what network it listens on, what credentials it can reach, and whether it handles sensitive data. That basic visibility will matter more than any vendor slide when the next AI runtime flaw drops.
Original sources: CERT/CC VU#777338 and Antiproof technical analysis.
