Skip to content

"No such file or directory" error on sandbox create from another container inside cluster #1849

@sjoerdvanBommel

Description

@sjoerdvanBommel

Agent Diagnostic

After debugging with the debug-openshell-cluster skill:

  1. Sandboxes ARE created successfully: Running kubectl get pods -n openshell shows all sandboxes in Running state
  2. The error happens AFTER creation: Server logs show CreateSandbox request completed successfully
  3. CLI waits for something that fails: After the sandbox pod starts, the CLI attempts some operation that fails with ENOENT
  4. openshell sandbox exec works: Commands can be executed in created sandboxes using openshell sandbox exec --name xyz pwd

Description

Summary

openshell sandbox create fails with "No such file or directory (os error 2)" even when no command is specified or when using simple commands like ls. The sandbox is successfully created and reaches "Ready" phase, but the command execution step consistently fails.

Expected Behavior

Sandbox should be created successfully and the command should drop into an interactive shell

Actual Behavior

Created sandbox: test-sandbox

  [0.0s] Requesting compute...
[32.0s] Sandbox allocated
[32.4s] Image pulled
[32.4s] Pulling image ghcr.io/nvidia/openshell-community/sandboxes/base:latest
[32.9s] Image pulled (1.0 GB)
[34.0s] Image pulled
[34.0s] Pulling image ghcr.io/nvidia/openshell-community/sandboxes/base:latest
[55.4s] Image pulled (1.0 GB)
Error:   × No such file or directory (os error 2)

Exit code: 1

Additional Test Cases

All of the following commands produce the same error:

Test 1: With explicit command

openshell sandbox create --name test-ls -- ls -la

Result: Same "No such file or directory" error

Test 2: With shell command

openshell sandbox create --name test-sh -- /bin/sh -c 'echo "hello"'

Result: Same "No such file or directory" error

Test 3: With simple echo

openshell sandbox create --name test-echo -- echo Created

Result: Same "No such file or directory" error

Hypothesis

The error appears to be in the CLI client's execution path after the sandbox is created. Possible causes:

  1. Missing executable: The CLI might be trying to execute an SSH binary, rsync, or another dependency that doesn't exist in the client container
  2. Protocol issue: The HTTP2/gRPC connection issue in the logs suggests a communication problem during command execution
  3. Path resolution: The "os error 2" (ENOENT) suggests the CLI is looking for a file/binary that doesn't exist

Environment-Specific Behavior

The issue occurs when running openshell CLI from:

  • Node.js execFile() in a container
  • Shell (sh -c) in a container
  • But works when run directly from kubectl exec interactive session

This suggests the CLI is trying to access a file/resource that exists in interactive sessions but not in non-interactive/programmatic contexts.

Workaround

Option 1: Ignore the error (Sandbox is actually created)

try {
  await execFileAsync('openshell', ['sandbox', 'create', '--name', name]);
} catch (error) {
  // Sandbox is created despite the error
  // Verify with: openshell sandbox list
  console.log('Sandbox likely created despite error');
}

Option 2: Create and verify separately

# Create (will fail but sandbox is created)
openshell sandbox create --name xyz 2>&1 | grep "Created sandbox" && echo "Success"

# Verify it exists
openshell sandbox list | grep xyz

# Use it
openshell sandbox exec --name xyz -- <command>

Option 3: Use the API directly
Skip the CLI and call the OpenShell gRPC API directly to avoid the CLI's post-creation logic.

However, all workarounds are suboptimal for automation and programmatic usage.

Reproduction Steps

  1. Install OpenShell CLI in a container in your cluster:
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh
  1. Register the gateway in that container:
openshell gateway add http://openshell.openshell.svc.cluster.local:8080 --name openshell-local
openshell gateway select openshell-local
  1. Attempt to create a sandbox from that container:
openshell sandbox create --name test-sandbox

Environment

  • Platform: Kubernetes (KinD cluster)
  • Gateway Type: Remote (plain HTTP)
  • Base Image: ghcr.io/nvidia/openshell-community/sandboxes/base:latest
  • Client Environment: Docker container running Debian Bookworm Slim with Node.js 24
  • OS: Apple M4 Pro (26.5 (25F71))
  • Reproduced on: 2026-06-10
  • Related Components: CLI, Kubernetes driver, Remote gateway

Logs

## Server Logs (Relevant Sections)


[2026-06-10T09:00:06.550487Z]  INFO openshell_server::grpc::sandbox: minted sandbox JWT sandbox_id=30c6071d-5cf4-4800-b547-b263479c4a5a
[2026-06-10T09:00:06.558861Z]  INFO openshell_driver_kubernetes::driver: Sandbox created in Kubernetes successfully sandbox_id=30c6071d-5cf4-4800-b547-b263479c4a5a sandbox_name=debug-test
[2026-06-10T09:00:06.558904Z]  INFO openshell_server::grpc::sandbox: CreateSandbox request completed successfully sandbox_id=30c6071d-5cf4-4800-b547-b263479c4a5a sandbox_name=debug-test


Later:

[2026-06-10T09:00:34.011422Z]  WARN openshell_server::supervisor_session: supervisor session: stream error sandbox_id=09d5f87c-ebdc-4751-ad99-e3733930d02b session_id=1acf9eca-2180-41ef-937a-e2c0d464336f error=status: Unknown, message: "h2 protocol error: error reading a body from connection"

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:cliCLI-related workarea:clusterRelated to running OpenShell on k3s/dockerarea:sandboxSandbox runtime and isolation workos:linuxBug affects Linux hoststopic:compatibilityCompatibility-related work

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions