🧰

Capabilities terms

What the agent can actually do — tools, browsers, code, memory, vision.

🧰Capabilities
Agent memory
The persistent state an agent maintains across turns and sessions — covers short-term context, long-term facts, episodic events, and procedural skills. Distinct from the LLM context window.
🧰Capabilities
Agentic browser
A web browser where an AI agent is a first-class user — given a goal, the browser plans, navigates, clicks, and fills forms across pages. Arc Search, Dia, and Comet are 2026 examples.
🧰Capabilities
Agentic commerce
The emerging pattern where an AI agent — not a human — researches, compares, and transacts on behalf of the user. 2026 standards from Visa, Mastercard, and Stripe are formalizing it.
🧰Capabilities
Agentic RAG
A retrieval pattern where the agent decides when and what to retrieve — issuing its own search queries, refining them, and iterating — instead of a single up-front retrieval step.
🧰Capabilities
Agentic search
A search pattern where an agent — not a single retrieval call — runs the query: it plans, queries multiple sources, evaluates results, refines, and returns a synthesized answer with citations.
🧰Capabilities
AI citations
AI outputs that include verifiable links or references to source documents — the trust primitive that separates research-grade AI from pure generative chat.
🧰Capabilities
AI code review
An agent that reviews pull requests — reads the diff, finds bugs, flags style and security issues, and posts inline comments. GitHub Copilot Code Review, CodeRabbit, Greptile, and Cursor BugBot lead the 2026 market.
🧰Capabilities
AI data analyst
An agent that connects to data warehouses, runs SQL, builds charts, and produces narrative analyses — replacing the "Slack-to-the-analytics-team" loop for routine business questions.
🧰Capabilities
AI meeting assistant
An agent that joins meetings (or processes recordings) to transcribe, summarize, extract action items, and follow up — Otter, Fireflies, Granola, and Read.ai are 2026 leaders.
🧰Capabilities
AI Operator
OpenAI's browser-based autonomous agent that takes a task ("book me a flight to SFO next Tuesday") and completes it by driving a real web browser — clicking, typing, navigating, and confirming.
🧰Capabilities
AI research agent
An agent that takes a research question, searches multiple sources over multiple rounds, synthesizes a sourced report, and follows up with clarifications. Distinct from a search box.
🧰Capabilities
AI streaming
Sending model output to the user token-by-token as it generates, instead of waiting for the full response. The default UX pattern for AI chat in 2026.
🧰Capabilities
Artifact
A UI pattern where AI-generated content (code, documents, diagrams) renders in a separate panel beside the chat, so users can edit and iterate without losing the conversation.
🧰Capabilities
Browser agent
An AI agent specialized in driving a web browser — navigating sites, filling forms, scraping data, and completing multi-step web workflows on behalf of the user.
🧰Capabilities
Browser use
An agent capability where the LLM drives a real web browser to read, click, and fill forms on live websites.
🧰Capabilities
Canvas
OpenAI's side-panel editing surface for documents and code generated by ChatGPT — the OpenAI equivalent of Claude's Artifacts.
🧰Capabilities
Code execution
An agent capability for writing and running code in a sandboxed environment — usually Python — to compute, transform data, or test hypotheses.
🧰Capabilities
Coding agent
An AI agent specialized in software engineering tasks — reading codebases, writing code, running tests, opening pull requests, and fixing bugs.
🧰Capabilities
Computer use
An agent capability where the LLM controls a computer's mouse, keyboard, and screen directly — interpreting screenshots, clicking, typing, and navigating arbitrary desktop and browser apps.
🧰Capabilities
Computer vision
The AI field focused on letting machines understand images and video — covers object detection, image classification, segmentation, OCR, scene understanding, and more.
🧰Capabilities
Deep research
An agent capability that produces long-form, multi-source research reports by autonomously browsing the web, reading documents, and synthesizing findings — typically running for 5–30 minutes per query.
🧰Capabilities
Embedding model
A neural network specialized for converting text (or images, audio) into fixed-length dense vectors — used for semantic search, RAG, clustering, and similarity tasks.
🧰Capabilities
Episodic memory
The agent's memory of specific past events and sessions — "what happened when" — usually stored as timestamped summaries that can be retrieved by time, topic, or participant.
🧰Capabilities
Image generation
The broader category of AI-generated images — includes text-to-image, image-to-image (editing), inpainting, outpainting, and style transfer. Powered by diffusion models or transformer-based image generators.
🧰Capabilities
Long-term memory
The agent's memory that survives across sessions, sometimes across months or years — usually a vector store plus a key-value store, with episodic, semantic, and procedural layers underneath.
🧰Capabilities
Memory
The mechanism by which an agent remembers information across sessions — usually a vector store or structured key-value cache.
🧰Capabilities
Multi-agent
An architecture where several specialized agents collaborate on the same task — each handles a sub-goal and they coordinate through a shared workspace.
🧰Capabilities
Multimodal AI
AI systems that process and reason across multiple input types — text, images, audio, video — within a single model, instead of routing each modality through separate specialized models.
🧰Capabilities
Natural language understanding (NLU)
The AI subfield focused on extracting meaning from human language — intent classification, entity extraction, sentiment analysis, and semantic interpretation. In 2026, mostly subsumed by LLMs.
🧰Capabilities
OCR (Optical Character Recognition)
Technology that extracts text from images, scanned documents, and PDFs — in 2026, OCR is often built into multimodal LLMs (Claude, GPT-4o, Gemini) rather than requiring a separate service.
🧰Capabilities
RAG
Retrieval-augmented generation — pulling relevant documents from a knowledge base before generating, so the LLM grounds its answer in your data.
🧰Capabilities
Reranker
A second-stage retrieval model that re-scores a small set of candidate results from initial retrieval — using a cross-encoder or LLM to produce more accurate final rankings.
🧰Capabilities
Self-correction
An agent capability where the model evaluates its own output for errors and produces a corrected version — improves accuracy on verifiable tasks by 10–30% in 2026.
🧰Capabilities
Self-reflection
An agent capability where the model generates an explicit reflection on its own reasoning or outputs — used to improve subsequent steps or detect errors before they propagate.
🧰Capabilities
Semantic memory
The agent's store of timeless facts — "the user is a VP of Sales at Acme," "the company uses Snowflake" — distinct from events (episodic) or skills (procedural).
🧰Capabilities
Semantic search
Search that ranks results by meaning rather than keyword overlap — using vector embeddings or LLM reasoning to match queries with conceptually similar content.
🧰Capabilities
Speech-to-text (STT)
AI technology that converts spoken audio into written text — also called Automatic Speech Recognition (ASR). The input half of voice AI, distinct from TTS which produces speech.
🧰Capabilities
Text-to-image
AI technology that generates images from text prompts — Midjourney, DALL-E 3, Stable Diffusion 3.5, Flux, and Ideogram are the 2026 leaders.
🧰Capabilities
Text-to-speech (TTS)
AI technology that converts written text into natural-sounding spoken audio — the synthesis half of voice AI, distinct from STT which goes the other direction.
🧰Capabilities
Text-to-video
AI technology that generates video clips from text prompts — Runway Gen-4, OpenAI Sora, Google Veo, and Kling are the 2026 leaders. Output is typically 5–30 seconds.
🧰Capabilities
Tool use
The ability of an LLM to invoke external functions — APIs, shell commands, internal services — instead of just generating text.
🧰Capabilities
Vector database
A database optimized for storing and querying high-dimensional embedding vectors — enabling semantic search, RAG, and similarity-based retrieval at scale.
🧰Capabilities
Vector embedding
A dense numerical vector representation of text, image, or audio — produced by an embedding model and used to measure semantic similarity in high-dimensional space.
🧰Capabilities
Vector search
Search powered by vector similarity — finding the K nearest embedding vectors to a query vector, typically using approximate-nearest-neighbor algorithms like HNSW or IVF.
🧰Capabilities
Vibe coding
A 2025–2026 term for coding by describing what you want in natural language and letting AI generate the code — popularized by tools like Lovable, Bolt, Cursor Composer, and Replit Agent.
🧰Capabilities
Vision
An agent capability for understanding images, screenshots, and video — letting the model reason over visual content.
🧰Capabilities
Voice
An agent capability for taking phone calls, holding spoken conversations, and triggering actions from voice input.
🧰Capabilities
Voice agent
An AI agent that takes phone calls, holds spoken conversations in real time, and triggers actions from voice input — handling customer support, scheduling, and outbound calling.
🧰Capabilities
Voice cloning
AI technology that creates a synthetic voice indistinguishable from a target speaker — typically trained on 30 seconds to a few minutes of clean source audio.
🧰Capabilities
Working memory
Short-lived, task-scoped memory the agent uses to track the current goal, plan, and intermediate results — analogous to a human scratchpad during a single problem.