Capabilities terms
What the agent can actually do โ tools, browsers, code, memory, vision.
- ๐งฐCapabilitiesAgent memory
The persistent state an agent maintains across turns and sessions โ covers short-term context, long-term facts, episodic events, and procedural skills. Distinct from the LLM context window.
- ๐งฐCapabilitiesAgentic browser
A web browser where an AI agent is a first-class user โ given a goal, the browser plans, navigates, clicks, and fills forms across pages. Arc Search, Dia, and Comet are 2026 examples.
- ๐งฐCapabilitiesAgentic commerce
The emerging pattern where an AI agent โ not a human โ researches, compares, and transacts on behalf of the user. 2026 standards from Visa, Mastercard, and Stripe are formalizing it.
- ๐งฐCapabilitiesAgentic RAG
A retrieval pattern where the agent decides when and what to retrieve โ issuing its own search queries, refining them, and iterating โ instead of a single up-front retrieval step.
- ๐งฐCapabilitiesAgentic search
A search pattern where an agent โ not a single retrieval call โ runs the query: it plans, queries multiple sources, evaluates results, refines, and returns a synthesized answer with citations.
- ๐งฐCapabilitiesAI citations
AI outputs that include verifiable links or references to source documents โ the trust primitive that separates research-grade AI from pure generative chat.
- ๐งฐCapabilitiesAI code review
An agent that reviews pull requests โ reads the diff, finds bugs, flags style and security issues, and posts inline comments. GitHub Copilot Code Review, CodeRabbit, Greptile, and Cursor BugBot lead the 2026 market.
- ๐งฐCapabilitiesAI data analyst
An agent that connects to data warehouses, runs SQL, builds charts, and produces narrative analyses โ replacing the "Slack-to-the-analytics-team" loop for routine business questions.
- ๐งฐCapabilitiesAI meeting assistant
An agent that joins meetings (or processes recordings) to transcribe, summarize, extract action items, and follow up โ Otter, Fireflies, Granola, and Read.ai are 2026 leaders.
- ๐งฐCapabilitiesAI Operator
OpenAI's browser-based autonomous agent that takes a task ("book me a flight to SFO next Tuesday") and completes it by driving a real web browser โ clicking, typing, navigating, and confirming.
- ๐งฐCapabilitiesAI research agent
An agent that takes a research question, searches multiple sources over multiple rounds, synthesizes a sourced report, and follows up with clarifications. Distinct from a search box.
- ๐งฐCapabilitiesAI streaming
Sending model output to the user token-by-token as it generates, instead of waiting for the full response. The default UX pattern for AI chat in 2026.
- ๐งฐCapabilitiesArtifact
A UI pattern where AI-generated content (code, documents, diagrams) renders in a separate panel beside the chat, so users can edit and iterate without losing the conversation.
- ๐งฐCapabilitiesBrowser agent
An AI agent specialized in driving a web browser โ navigating sites, filling forms, scraping data, and completing multi-step web workflows on behalf of the user.
- ๐งฐCapabilitiesBrowser use
An agent capability where the LLM drives a real web browser to read, click, and fill forms on live websites.
- ๐งฐCapabilitiesCanvas
OpenAI's side-panel editing surface for documents and code generated by ChatGPT โ the OpenAI equivalent of Claude's Artifacts.
- ๐งฐCapabilitiesCode execution
An agent capability for writing and running code in a sandboxed environment โ usually Python โ to compute, transform data, or test hypotheses.
- ๐งฐCapabilitiesCoding agent
An AI agent specialized in software engineering tasks โ reading codebases, writing code, running tests, opening pull requests, and fixing bugs.
- ๐งฐCapabilitiesComputer use
An agent capability where the LLM controls a computer's mouse, keyboard, and screen directly โ interpreting screenshots, clicking, typing, and navigating arbitrary desktop and browser apps.
- ๐งฐCapabilitiesComputer vision
The AI field focused on letting machines understand images and video โ covers object detection, image classification, segmentation, OCR, scene understanding, and more.
- ๐งฐCapabilitiesDeep research
An agent capability that produces long-form, multi-source research reports by autonomously browsing the web, reading documents, and synthesizing findings โ typically running for 5โ30 minutes per query.
- ๐งฐCapabilitiesEmbedding model
A neural network specialized for converting text (or images, audio) into fixed-length dense vectors โ used for semantic search, RAG, clustering, and similarity tasks.
- ๐งฐCapabilitiesEpisodic memory
The agent's memory of specific past events and sessions โ "what happened when" โ usually stored as timestamped summaries that can be retrieved by time, topic, or participant.
- ๐งฐCapabilitiesImage generation
The broader category of AI-generated images โ includes text-to-image, image-to-image (editing), inpainting, outpainting, and style transfer. Powered by diffusion models or transformer-based image generators.
- ๐งฐCapabilitiesLong-term memory
The agent's memory that survives across sessions, sometimes across months or years โ usually a vector store plus a key-value store, with episodic, semantic, and procedural layers underneath.
- ๐งฐCapabilitiesMemory
The mechanism by which an agent remembers information across sessions โ usually a vector store or structured key-value cache.
- ๐งฐCapabilitiesMulti-agent
An architecture where several specialized agents collaborate on the same task โ each handles a sub-goal and they coordinate through a shared workspace.
- ๐งฐCapabilitiesMultimodal AI
AI systems that process and reason across multiple input types โ text, images, audio, video โ within a single model, instead of routing each modality through separate specialized models.
- ๐งฐCapabilitiesNatural language understanding (NLU)
The AI subfield focused on extracting meaning from human language โ intent classification, entity extraction, sentiment analysis, and semantic interpretation. In 2026, mostly subsumed by LLMs.
- ๐งฐCapabilitiesOCR (Optical Character Recognition)
Technology that extracts text from images, scanned documents, and PDFs โ in 2026, OCR is often built into multimodal LLMs (Claude, GPT-4o, Gemini) rather than requiring a separate service.
- ๐งฐCapabilitiesRAG
Retrieval-augmented generation โ pulling relevant documents from a knowledge base before generating, so the LLM grounds its answer in your data.
- ๐งฐCapabilitiesReranker
A second-stage retrieval model that re-scores a small set of candidate results from initial retrieval โ using a cross-encoder or LLM to produce more accurate final rankings.
- ๐งฐCapabilitiesSelf-correction
An agent capability where the model evaluates its own output for errors and produces a corrected version โ improves accuracy on verifiable tasks by 10โ30% in 2026.
- ๐งฐCapabilitiesSelf-reflection
An agent capability where the model generates an explicit reflection on its own reasoning or outputs โ used to improve subsequent steps or detect errors before they propagate.
- ๐งฐCapabilitiesSemantic memory
The agent's store of timeless facts โ "the user is a VP of Sales at Acme," "the company uses Snowflake" โ distinct from events (episodic) or skills (procedural).
- ๐งฐCapabilitiesSemantic search
Search that ranks results by meaning rather than keyword overlap โ using vector embeddings or LLM reasoning to match queries with conceptually similar content.
- ๐งฐCapabilitiesSpeech-to-text (STT)
AI technology that converts spoken audio into written text โ also called Automatic Speech Recognition (ASR). The input half of voice AI, distinct from TTS which produces speech.
- ๐งฐCapabilitiesText-to-image
AI technology that generates images from text prompts โ Midjourney, DALL-E 3, Stable Diffusion 3.5, Flux, and Ideogram are the 2026 leaders.
- ๐งฐCapabilitiesText-to-speech (TTS)
AI technology that converts written text into natural-sounding spoken audio โ the synthesis half of voice AI, distinct from STT which goes the other direction.
- ๐งฐCapabilitiesText-to-video
AI technology that generates video clips from text prompts โ Runway Gen-4, OpenAI Sora, Google Veo, and Kling are the 2026 leaders. Output is typically 5โ30 seconds.
- ๐งฐCapabilitiesTool use
The ability of an LLM to invoke external functions โ APIs, shell commands, internal services โ instead of just generating text.
- ๐งฐCapabilitiesVector database
A database optimized for storing and querying high-dimensional embedding vectors โ enabling semantic search, RAG, and similarity-based retrieval at scale.
- ๐งฐCapabilitiesVector embedding
A dense numerical vector representation of text, image, or audio โ produced by an embedding model and used to measure semantic similarity in high-dimensional space.
- ๐งฐCapabilitiesVector search
Search powered by vector similarity โ finding the K nearest embedding vectors to a query vector, typically using approximate-nearest-neighbor algorithms like HNSW or IVF.
- ๐งฐCapabilitiesVibe coding
A 2025โ2026 term for coding by describing what you want in natural language and letting AI generate the code โ popularized by tools like Lovable, Bolt, Cursor Composer, and Replit Agent.
- ๐งฐCapabilitiesVision
An agent capability for understanding images, screenshots, and video โ letting the model reason over visual content.
- ๐งฐCapabilitiesVoice
An agent capability for taking phone calls, holding spoken conversations, and triggering actions from voice input.
- ๐งฐCapabilitiesVoice agent
An AI agent that takes phone calls, holds spoken conversations in real time, and triggers actions from voice input โ handling customer support, scheduling, and outbound calling.
- ๐งฐCapabilitiesVoice cloning
AI technology that creates a synthetic voice indistinguishable from a target speaker โ typically trained on 30 seconds to a few minutes of clean source audio.
- ๐งฐCapabilitiesWorking memory
Short-lived, task-scoped memory the agent uses to track the current goal, plan, and intermediate results โ analogous to a human scratchpad during a single problem.