Capabilities terms
What the agent can actually do — tools, browsers, code, memory, vision.
- 🧰CapabilitiesBrowser use
An agent capability where the LLM drives a real web browser to read, click, and fill forms on live websites.
- 🧰CapabilitiesCode execution
An agent capability for writing and running code in a sandboxed environment — usually Python — to compute, transform data, or test hypotheses.
- 🧰CapabilitiesMemory
The mechanism by which an agent remembers information across sessions — usually a vector store or structured key-value cache.
- 🧰CapabilitiesMulti-agent
An architecture where several specialized agents collaborate on the same task — each handles a sub-goal and they coordinate through a shared workspace.
- 🧰CapabilitiesRAG
Retrieval-augmented generation — pulling relevant documents from a knowledge base before generating, so the LLM grounds its answer in your data.
- 🧰CapabilitiesTool use
The ability of an LLM to invoke external functions — APIs, shell commands, internal services — instead of just generating text.
- 🧰CapabilitiesVision
An agent capability for understanding images, screenshots, and video — letting the model reason over visual content.
- 🧰CapabilitiesVoice
An agent capability for taking phone calls, holding spoken conversations, and triggering actions from voice input.