Punching Through NVIDIA NemoClaw's Sandbox to Hit Local vLLM on RTX 5090
Disclaimer: This is an experimental build, not a production setup. NemoClaw is early-stage (v0.0.7), the network hacks are volatile, and I'm documenting this because I couldn't find anyone else try...

Source: DEV Community
Disclaimer: This is an experimental build, not a production setup. NemoClaw is early-stage (v0.0.7), the network hacks are volatile, and I'm documenting this because I couldn't find anyone else trying it. What Is NemoClaw? NVIDIA NemoClaw (OpenShell) is a sandboxed execution environment for AI agents. It runs a k3s cluster inside Docker, creates isolated sandbox namespaces, and lets agents execute code in a locked-down container. The default workflow: your agent talks to NVIDIA's cloud inference API. The sandbox allows outbound HTTPS to integrate.api.nvidia.com and blocks most other traffic. But what if you have an RTX 5090 sitting right there on the host, running vLLM with Nemotron 9B? I wanted to see if I could route the sandbox's inference to my local GPU instead. Spoiler: it works, but the network isolation requires three separate workarounds. The Network Topology WSL2 Host vLLM on 0.0.0.0:8000 Docker bridge: 172.18.0.1 (yours will differ) | openshell-cluster container (172.18.0.2)