Computer use
An agent capability where the LLM controls a computer's mouse, keyboard, and screen directly — interpreting screenshots, clicking, typing, and navigating arbitrary desktop and browser apps.
Computer use is Anthropic's flagship 2024–2026 capability: the model takes screenshots, decides where to click and what to type, and executes those actions on a real (usually sandboxed) computer. Unlike traditional browser automation, the agent works with the visual UI the way a human would — no DOM scraping required.
The capability unlocks an enormous workflow surface: any app with a UI becomes scriptable by an LLM. RPA without the brittle selectors. Cross-app workflows that weave together five SaaS tools no one bothered to API-integrate. Testing flows that exercise the real product instead of brittle synthetic actions.
In 2026 the production-ready stack is still maturing. Claude's computer-use mode is the most capable; OpenAI's Operator covers similar ground in browser-only contexts; open-source frameworks like Anthropic's Claude Computer-Use SDK and OpenAI's Agent SDK make it deployable. Reliability is the main gap — 5–15% action error rates that require retry logic.
Frequently asked
What is the difference between computer use and browser use?+
Browser use is scoped to a web browser (DOM scraping or visual). Computer use covers the full desktop — any app the operating system runs. Computer use is the superset.
Is computer use reliable enough for production?+
For high-stakes flows: not without heavy retry logic and human review. For internal automation and prototype work: yes, with the caveat that 5–15% of actions will need re-attempting.