Capabilities terms
What the agent can actually do — tools, browsers, code, memory, vision.
- 🧰CapabilitiesAgent memory
The persistent state an agent maintains across turns and sessions — covers short-term context, long-term facts, episodic events, and procedural skills. Distinct from the LLM context window.
- 🧰CapabilitiesAgentic browser
A web browser where an AI agent is a first-class user — given a goal, the browser plans, navigates, clicks, and fills forms across pages. Arc Search, Dia, and Comet are 2026 examples.
- 🧰CapabilitiesAgentic commerce
The emerging pattern where an AI agent — not a human — researches, compares, and transacts on behalf of the user. 2026 standards from Visa, Mastercard, and Stripe are formalizing it.
- 🧰CapabilitiesAgentic RAG
A retrieval pattern where the agent decides when and what to retrieve — issuing its own search queries, refining them, and iterating — instead of a single up-front retrieval step.
- 🧰CapabilitiesAgentic search
A search pattern where an agent — not a single retrieval call — runs the query: it plans, queries multiple sources, evaluates results, refines, and returns a synthesized answer with citations.
- 🧰CapabilitiesAI citations
AI outputs that include verifiable links or references to source documents — the trust primitive that separates research-grade AI from pure generative chat.
- 🧰CapabilitiesAI code review
An agent that reviews pull requests — reads the diff, finds bugs, flags style and security issues, and posts inline comments. GitHub Copilot Code Review, CodeRabbit, Greptile, and Cursor BugBot lead the 2026 market.
- 🧰CapabilitiesAI data analyst
An agent that connects to data warehouses, runs SQL, builds charts, and produces narrative analyses — replacing the "Slack-to-the-analytics-team" loop for routine business questions.
- 🧰CapabilitiesAI meeting assistant
An agent that joins meetings (or processes recordings) to transcribe, summarize, extract action items, and follow up — Otter, Fireflies, Granola, and Read.ai are 2026 leaders.
- 🧰CapabilitiesAI Operator
OpenAI's browser-based autonomous agent that takes a task ("book me a flight to SFO next Tuesday") and completes it by driving a real web browser — clicking, typing, navigating, and confirming.
- 🧰CapabilitiesAI research agent
An agent that takes a research question, searches multiple sources over multiple rounds, synthesizes a sourced report, and follows up with clarifications. Distinct from a search box.
- 🧰CapabilitiesAI streaming
Sending model output to the user token-by-token as it generates, instead of waiting for the full response. The default UX pattern for AI chat in 2026.
- 🧰CapabilitiesArtifact
A UI pattern where AI-generated content (code, documents, diagrams) renders in a separate panel beside the chat, so users can edit and iterate without losing the conversation.
- 🧰CapabilitiesBrowser agent
An AI agent specialized in driving a web browser — navigating sites, filling forms, scraping data, and completing multi-step web workflows on behalf of the user.
- 🧰CapabilitiesBrowser use
An agent capability where the LLM drives a real web browser to read, click, and fill forms on live websites.
- 🧰CapabilitiesCanvas
OpenAI's side-panel editing surface for documents and code generated by ChatGPT — the OpenAI equivalent of Claude's Artifacts.
- 🧰CapabilitiesCode execution
An agent capability for writing and running code in a sandboxed environment — usually Python — to compute, transform data, or test hypotheses.
- 🧰CapabilitiesCoding agent
An AI agent specialized in software engineering tasks — reading codebases, writing code, running tests, opening pull requests, and fixing bugs.
- 🧰CapabilitiesComputer use
An agent capability where the LLM controls a computer's mouse, keyboard, and screen directly — interpreting screenshots, clicking, typing, and navigating arbitrary desktop and browser apps.
- 🧰CapabilitiesComputer vision
The AI field focused on letting machines understand images and video — covers object detection, image classification, segmentation, OCR, scene understanding, and more.
- 🧰CapabilitiesDeep research
An agent capability that produces long-form, multi-source research reports by autonomously browsing the web, reading documents, and synthesizing findings — typically running for 5–30 minutes per query.
- 🧰CapabilitiesEmbedding model
A neural network specialized for converting text (or images, audio) into fixed-length dense vectors — used for semantic search, RAG, clustering, and similarity tasks.
- 🧰CapabilitiesEpisodic memory
The agent's memory of specific past events and sessions — "what happened when" — usually stored as timestamped summaries that can be retrieved by time, topic, or participant.
- 🧰CapabilitiesImage generation
The broader category of AI-generated images — includes text-to-image, image-to-image (editing), inpainting, outpainting, and style transfer. Powered by diffusion models or transformer-based image generators.
- 🧰CapabilitiesLong-term memory
The agent's memory that survives across sessions, sometimes across months or years — usually a vector store plus a key-value store, with episodic, semantic, and procedural layers underneath.
- 🧰CapabilitiesMemory
The mechanism by which an agent remembers information across sessions — usually a vector store or structured key-value cache.
- 🧰CapabilitiesMulti-agent
An architecture where several specialized agents collaborate on the same task — each handles a sub-goal and they coordinate through a shared workspace.
- 🧰CapabilitiesMultimodal AI
AI systems that process and reason across multiple input types — text, images, audio, video — within a single model, instead of routing each modality through separate specialized models.
- 🧰CapabilitiesNatural language understanding (NLU)
The AI subfield focused on extracting meaning from human language — intent classification, entity extraction, sentiment analysis, and semantic interpretation. In 2026, mostly subsumed by LLMs.
- 🧰CapabilitiesOCR (Optical Character Recognition)
Technology that extracts text from images, scanned documents, and PDFs — in 2026, OCR is often built into multimodal LLMs (Claude, GPT-4o, Gemini) rather than requiring a separate service.
- 🧰CapabilitiesRAG
Retrieval-augmented generation — pulling relevant documents from a knowledge base before generating, so the LLM grounds its answer in your data.
- 🧰CapabilitiesReranker
A second-stage retrieval model that re-scores a small set of candidate results from initial retrieval — using a cross-encoder or LLM to produce more accurate final rankings.
- 🧰CapabilitiesSelf-correction
An agent capability where the model evaluates its own output for errors and produces a corrected version — improves accuracy on verifiable tasks by 10–30% in 2026.
- 🧰CapabilitiesSelf-reflection
An agent capability where the model generates an explicit reflection on its own reasoning or outputs — used to improve subsequent steps or detect errors before they propagate.
- 🧰CapabilitiesSemantic memory
The agent's store of timeless facts — "the user is a VP of Sales at Acme," "the company uses Snowflake" — distinct from events (episodic) or skills (procedural).
- 🧰CapabilitiesSemantic search
Search that ranks results by meaning rather than keyword overlap — using vector embeddings or LLM reasoning to match queries with conceptually similar content.
- 🧰CapabilitiesSpeech-to-text (STT)
AI technology that converts spoken audio into written text — also called Automatic Speech Recognition (ASR). The input half of voice AI, distinct from TTS which produces speech.
- 🧰CapabilitiesText-to-image
AI technology that generates images from text prompts — Midjourney, DALL-E 3, Stable Diffusion 3.5, Flux, and Ideogram are the 2026 leaders.
- 🧰CapabilitiesText-to-speech (TTS)
AI technology that converts written text into natural-sounding spoken audio — the synthesis half of voice AI, distinct from STT which goes the other direction.
- 🧰CapabilitiesText-to-video
AI technology that generates video clips from text prompts — Runway Gen-4, OpenAI Sora, Google Veo, and Kling are the 2026 leaders. Output is typically 5–30 seconds.
- 🧰CapabilitiesTool use
The ability of an LLM to invoke external functions — APIs, shell commands, internal services — instead of just generating text.
- 🧰CapabilitiesVector database
A database optimized for storing and querying high-dimensional embedding vectors — enabling semantic search, RAG, and similarity-based retrieval at scale.
- 🧰CapabilitiesVector embedding
A dense numerical vector representation of text, image, or audio — produced by an embedding model and used to measure semantic similarity in high-dimensional space.
- 🧰CapabilitiesVector search
Search powered by vector similarity — finding the K nearest embedding vectors to a query vector, typically using approximate-nearest-neighbor algorithms like HNSW or IVF.
- 🧰CapabilitiesVibe coding
A 2025–2026 term for coding by describing what you want in natural language and letting AI generate the code — popularized by tools like Lovable, Bolt, Cursor Composer, and Replit Agent.
- 🧰CapabilitiesVision
An agent capability for understanding images, screenshots, and video — letting the model reason over visual content.
- 🧰CapabilitiesVoice
An agent capability for taking phone calls, holding spoken conversations, and triggering actions from voice input.
- 🧰CapabilitiesVoice agent
An AI agent that takes phone calls, holds spoken conversations in real time, and triggers actions from voice input — handling customer support, scheduling, and outbound calling.
- 🧰CapabilitiesVoice cloning
AI technology that creates a synthetic voice indistinguishable from a target speaker — typically trained on 30 seconds to a few minutes of clean source audio.
- 🧰CapabilitiesWorking memory
Short-lived, task-scoped memory the agent uses to track the current goal, plan, and intermediate results — analogous to a human scratchpad during a single problem.