Computer Use
Computer use gives Claude a headless desktop environment inside the sandbox. Claude can take screenshots, click, type, scroll, and browse the web — all through MCP tools. You can watch live via VNC.
Requirements
Section titled “Requirements”- A sandbox with
computer_use: true(Docker or Libvirt — not macOS) - The
reviewsection configured for Mini Apps (needed for VNC viewer)
Docker
Section titled “Docker”contexts: browser-tasks: directory: /home/you/Documents/browser-project description: "Browser automation" allowed_tools: - LSP - AskUserQuestion sandbox: backend: docker computer_use: true
review: tunnel: cloudflared # needed for VNC Mini AppThe computer-use Docker image (openshrimp-computer-use) extends the base image with a Wayland compositor, Chromium, and a terminal.
Libvirt VM
Section titled “Libvirt VM”contexts: browser-tasks: directory: /home/you/Documents/browser-project description: "Browser automation" allowed_tools: - LSP - AskUserQuestion sandbox: backend: libvirt computer_use: trueThe desktop environment
Section titled “The desktop environment”The sandbox runs a headless 1280x720 Wayland desktop with:
- labwc — lightweight Wayland compositor
- Chromium — web browser
- foot — terminal emulator
- wayvnc — VNC server for live viewing
MCP tools
Section titled “MCP tools”When computer use is enabled, these MCP tools are registered automatically:
| Tool | Description |
|---|---|
computer_screenshot | Take a PNG screenshot (1280x720). Sent to Telegram and returned for Claude to analyze. |
computer_click | Click at (x, y) coordinates. Supports left, right, and middle buttons. |
computer_type | Type text character by character. |
computer_key | Press a key or key combo (e.g. ctrl+a, alt+F4, super+d). |
computer_scroll | Scroll at (x, y) in a direction (up/down/left/right). |
computer_toplevel | Focus a window by name (case-insensitive substring match). |
How Claude uses them
Section titled “How Claude uses them”Claude follows a screenshot-act loop:
- Take a screenshot to see the current state
- Decide what to do (click a button, type text, etc.)
- Perform the action
- Take another screenshot to verify the result
Screenshots are automatically sent to your Telegram chat so you can see what Claude sees.
Watching live via VNC
Section titled “Watching live via VNC”Use the /vnc command to open the VNC viewer Mini App in Telegram. This gives you a live view of the desktop as Claude interacts with it.
/vncThe VNC viewer uses noVNC and connects through a WebSocket proxy to the sandbox’s VNC server.
Implementation differences by backend
Section titled “Implementation differences by backend”Docker
Section titled “Docker”- Screenshots via
grim(Wayland screenshot tool) - Input via
wlrctl(Wayland input simulation) - Window focus via
wlrctl - VNC exposed on a dynamic port
Libvirt VM
Section titled “Libvirt VM”- Screenshots via the libvirt domain screenshot API
- Input via QMP (QEMU Machine Protocol) — mouse events, key presses
- Window focus not directly supported (use
Alt+Tabor similar key combos) - VNC port auto-assigned from QEMU’s VNC server
- Claude works best when you describe what you want it to do on the screen rather than giving pixel coordinates
- For web tasks, you can ask Claude to open Chromium and navigate to a URL
- Screenshots are 1280x720 — this is the desktop resolution Claude interacts with
- If Claude gets stuck, you can connect via VNC and interact manually