Le glossaire des agents IA.
Définitions claires en français pour les termes que vous voyez constamment — autonomie, capacités, architectures, tooling et tarification.
📚Tous les termes(212)
- 🔌ToolingA2A protocol
Google's 2025 open Agent2Agent protocol — a standard for agents from different vendors to discover each other, exchange tasks, and stream results.
- 🏗️ArchitectureAdaptive RAG
A RAG variant that routes queries to different retrieval strategies based on complexity — simple questions skip retrieval, hard ones get multi-hop retrieval.
- 🧭AutonomyAgent
A software system powered by an LLM that perceives its environment, plans actions, and executes them — usually across multiple steps and tools.
- 🧰CapabilitiesAgent memory
The persistent state an agent maintains across turns and sessions — covers short-term context, long-term facts, episodic events, and procedural skills. Distinct from the LLM context window.
- 📊EvaluationAgent observability
Specialized observability for AI agents — tracing the agent's reasoning, tool calls, sub-agent communication, state changes, and decision points across a multi-step run.
- 🏗️ArchitectureAgent orchestration
The control layer that coordinates multiple agents or agent steps — routing work, managing state, enforcing hand-off rules, and resolving conflicts between specialized agents.
- 🔌ToolingAgent protocols
The umbrella term for the standards agents use to communicate with tools (MCP) and with each other (A2A) — the connective-tissue layer of the 2026 agent ecosystem.
- 📊EvaluationAgent sandbox
An isolated execution environment — usually a container, microVM, or browser profile — where an agent can run code, browse, and act without affecting the host system or shared state.
- 📊EvaluationAgentBench
A multi-environment benchmark suite for LLM-as-agent performance — covers OS, database, web shopping, knowledge graph, card game, and lateral-thinking tasks across 8 environments.
- 🧭AutonomyAgentic AI
AI systems that act with autonomy — perceiving their environment, planning multi-step actions, calling tools, and iterating toward a goal — as opposed to single-turn generative AI that only responds to prompts.
- 🧰CapabilitiesAgentic browser
A web browser where an AI agent is a first-class user — given a goal, the browser plans, navigates, clicks, and fills forms across pages. Arc Search, Dia, and Comet are 2026 examples.
- 🧰CapabilitiesAgentic commerce
The emerging pattern where an AI agent — not a human — researches, compares, and transacts on behalf of the user. 2026 standards from Visa, Mastercard, and Stripe are formalizing it.
- 🏗️ArchitectureAgentic loop
The core control flow of an agent: observe → reason → act → observe, repeated until the goal is met or a stop condition fires.
- 🧰CapabilitiesAgentic RAG
A retrieval pattern where the agent decides when and what to retrieve — issuing its own search queries, refining them, and iterating — instead of a single up-front retrieval step.
- 🧰CapabilitiesAgentic search
A search pattern where an agent — not a single retrieval call — runs the query: it plans, queries multiple sources, evaluates results, refines, and returns a synthesized answer with citations.
- 🧭AutonomyAgentic workflow
A workflow where an AI agent plans, executes, and adapts a multi-step process — autonomously calling tools, reading data, and looping until the goal is met.
- 🚀DeploymentAI agent framework
A library or toolkit for building AI agents — providing primitives for tool calling, planning, memory, and orchestration so you do not rebuild the agent loop from scratch.
- 📊EvaluationAI alignment
The research and engineering practice of ensuring AI systems pursue the goals their designers intend — covering training-time techniques like RLHF and constitutional AI as well as deployment-time guardrails.
- 💼BusinessAI BDR
An AI business development representative — an autonomous sales agent focused on outbound prospecting, top-of-funnel qualification, and pipeline generation, often used interchangeably with AI SDR.
- 📊EvaluationAI bias
Systematic errors in AI outputs that disadvantage specific groups, perspectives, or topics — caused by biased training data, biased reward signals, or biased evaluation criteria.
- 🧰CapabilitiesAI citations
AI outputs that include verifiable links or references to source documents — the trust primitive that separates research-grade AI from pure generative chat.
- 🧰CapabilitiesAI code review
An agent that reviews pull requests — reads the diff, finds bugs, flags style and security issues, and posts inline comments. GitHub Copilot Code Review, CodeRabbit, Greptile, and Cursor BugBot lead the 2026 market.
- 📊EvaluationAI content moderation
The classifier and policy layer that filters input to and output from an LLM agent — blocks unsafe categories (CSAM, self-harm, malware), enforces brand voice, and flags PII.
- 🧰CapabilitiesAI data analyst
An agent that connects to data warehouses, runs SQL, builds charts, and produces narrative analyses — replacing the "Slack-to-the-analytics-team" loop for routine business questions.
- 🚀DeploymentAI drift
The phenomenon where an AI system's behavior changes over time without explicit code changes — caused by model version updates, training data shifts, or vendor-side changes.
- 💼BusinessAI employee
Marketing-grade synonym for a digital worker — an agent positioned as a hireable, role-shaped teammate. Notable 2024–2026 examples: 11x's Alice, Artisan's Ava, Devin from Cognition.
- 📊EvaluationAI evals
Systematic test suites for AI systems — input/expected-output pairs run automatically to catch regressions when models or prompts change.
- 🔌ToolingAI gateway
A middleware layer that sits between your application and LLM APIs — handles routing, fallback, caching, rate limiting, cost tracking, and observability across multiple model providers.
- 📊EvaluationAI governance
The framework of policies, controls, and review processes that ensure AI systems are deployed safely, ethically, and in compliance with regulation — covers risk management, audit trails, and stakeholder accountability.
- 🧭AutonomyAI handoff
The transition pattern where an AI agent transfers control of a task to a human or to another agent — preserving context, state, and prior actions so the receiver can continue seamlessly.
- 💼BusinessAI maturity model
A staged framework describing organizational AI evolution — typically experimenting → piloting → scaling → optimizing → transforming. Used to plan investment + measure progress.
- 🧰CapabilitiesAI meeting assistant
An agent that joins meetings (or processes recordings) to transcribe, summarize, extract action items, and follow up — Otter, Fireflies, Granola, and Read.ai are 2026 leaders.
- 🧰CapabilitiesAI Operator
OpenAI's browser-based autonomous agent that takes a task ("book me a flight to SFO next Tuesday") and completes it by driving a real web browser — clicking, typing, navigating, and confirming.
- 🧭AutonomyAI pair programming
A workflow where an engineer codes alongside an AI assistant that suggests, completes, and reviews code in real time — distinct from autonomous coding agents that ship PRs without human intervention.
- 🚀DeploymentAI pilot
A time-boxed, scope-limited deployment of an AI agent against a real workflow to measure quality, cost, and adoption before broader rollout. The standard 2026 enterprise procurement pattern.
- 🔌ToolingAI pipeline
A multi-step data processing flow that includes one or more LLM or AI calls — typically combines preprocessing, retrieval, LLM inference, post-processing, and observability into a single deployable unit.
- 💼BusinessAI readiness
An assessment of an organization's preparedness to deploy AI productively — covering data infrastructure, talent, governance, and use-case maturity.
- 🧰CapabilitiesAI research agent
An agent that takes a research question, searches multiple sources over multiple rounds, synthesizes a sourced report, and follows up with clarifications. Distinct from a search box.
- 💼BusinessAI ROI
The business return generated by an AI deployment minus its full cost — model spend, infra, integration, change management, and risk. The 2026 procurement north-star metric.
- 📊EvaluationAI safety
The research and engineering discipline focused on making AI systems behave reliably, refuse harmful requests, and fail gracefully under unexpected inputs — covering both training-time alignment and deployment-time guardrails.
- 💼BusinessAI SDR
An AI sales development representative — an autonomous agent that handles prospecting, outbound email sequencing, and lead qualification end-to-end.
- 🧰CapabilitiesAI streaming
Sending model output to the user token-by-token as it generates, instead of waiting for the full response. The default UX pattern for AI chat in 2026.
- 📊EvaluationAI watermarking
Techniques that embed a detectable signal in AI-generated text, images, audio, or video so downstream systems can identify content as machine-generated.
- 💼BusinessAI workforce
The collective fleet of AI agents and digital workers an organization runs — managed as a unit with shared governance, shared identity, shared observability, and a unified cost model.
- 📊EvaluationAnswer relevance
The RAG eval metric that scores whether the answer actually addresses the user's question. Catches the "perfectly grounded but useless" failure mode.
- 📊EvaluationARC-AGI
François Chollet's benchmark for measuring fluid intelligence — agents must induce a transformation rule from a few input/output grid examples and apply it. Designed to resist memorization.
- 🧰CapabilitiesArtifact
A UI pattern where AI-generated content (code, documents, diagrams) renders in a separate panel beside the chat, so users can edit and iterate without losing the conversation.
- 🏗️ArchitectureAttention mechanism
The neural-network primitive that lets a transformer model weigh the importance of every input token when generating each output token — the core innovation behind LLMs.
- 🔌ToolingAutoGen
Microsoft's open-source framework for multi-agent conversation — agents talk to each other to solve problems collaboratively, with explicit support for code execution and human-in-the-loop.
- 🏗️ArchitectureAutoGPT
The 2023 open-source project that popularized autonomous LLM agents — wraps an LLM in a recursive plan-execute-reflect loop with persistent goals and tool use.
- 🧭AutonomyAutonomous agent
An agent that plans, executes, and finishes a multi-step task without asking for human approval between steps.
- 🏗️ArchitectureBabyAGI
A minimal 2023 reference implementation of an autonomous task-driven agent — three loops (task creation, prioritization, execution) in ~100 lines of Python.
- 🚀DeploymentBatch inference
Running model inference asynchronously over a large batch of inputs, traded for latency. OpenAI/Anthropic batch APIs are typically 50% cheaper than sync calls.
- 🔌ToolingBedrock Agents
AWS's managed agent service inside Amazon Bedrock — provides agent orchestration, tool integration via OpenAPI/Lambda, and a knowledge-base layer for RAG out of the box.
- 📊EvaluationBenchmark
A publicly-shared, standardized eval suite used to compare models and agents across a uniform task — SWE-bench, MMLU, GAIA, etc.
- 🧰CapabilitiesBrowser agent
An AI agent specialized in driving a web browser — navigating sites, filling forms, scraping data, and completing multi-step web workflows on behalf of the user.
- 🧰CapabilitiesBrowser use
An agent capability where the LLM drives a real web browser to read, click, and fill forms on live websites.
- 🚀DeploymentBYO key
A deployment pattern where you supply your own model API key to the agent — token costs are billed to you directly, the agent vendor charges only for the software.
- 🧰CapabilitiesCanvas
OpenAI's side-panel editing surface for documents and code generated by ChatGPT — the OpenAI equivalent of Claude's Artifacts.
- 🏗️ArchitectureChain of Agents
An architecture where agents run in sequence, each refining or extending the previous agent's output. Used for long documents or multi-stage workflows.
- 🏗️ArchitectureChain of thought
A prompting technique that asks the model to lay out its reasoning step-by-step before committing to an answer — improves accuracy on multi-step tasks.
- 📊EvaluationCitation quality
An eval metric for systems that cite sources — measures whether citations resolve to real documents, point to the supporting passage, and match the cited claim.
- 🧰CapabilitiesCode execution
An agent capability for writing and running code in a sandboxed environment — usually Python — to compute, transform data, or test hypotheses.
- 🧰CapabilitiesCoding agent
An AI agent specialized in software engineering tasks — reading codebases, writing code, running tests, opening pull requests, and fixing bugs.
- 🧰CapabilitiesComputer use
An agent capability where the LLM controls a computer's mouse, keyboard, and screen directly — interpreting screenshots, clicking, typing, and navigating arbitrary desktop and browser apps.
- 🧰CapabilitiesComputer vision
The AI field focused on letting machines understand images and video — covers object detection, image classification, segmentation, OCR, scene understanding, and more.
- 🏗️ArchitectureConstitutional AI
An alignment technique developed by Anthropic where the model is trained to follow a written set of principles ("a constitution") rather than per-example human preferences — produces safer behavior without massive human-labeling effort.
- 🏗️ArchitectureContext engineering
The discipline of curating what information goes into an LLM's context window — selecting, ordering, and formatting the system prompt, examples, retrieved documents, and conversation history for maximum effectiveness.
- 🔌ToolingContext window
The maximum number of tokens a model can consider at once — covers the system prompt, conversation, tool results, and the answer being generated.
- 🧭AutonomyCopilot
An AI tool that suggests changes inline and waits for the user to accept — the human stays in the driver's seat.
- 🏗️ArchitectureCorrective RAG
A RAG variant that grades retrieved documents and triggers fallback retrieval (web search, alternative sources) when the initial retrieval scores low on relevance.
- 💼BusinessCost per task
The fully-loaded cost of an AI completing one unit of work — model spend + infrastructure + integration cost amortized + retries. The right unit for AI ROI math.
- 🔌ToolingCrewAI
An open-source Python framework for role-based multi-agent systems — define agents with roles, goals, and tools, then orchestrate them into "crews" that collaborate on tasks.
- 🧰CapabilitiesDeep research
An agent capability that produces long-form, multi-source research reports by autonomously browsing the web, reading documents, and synthesizing findings — typically running for 5–30 minutes per query.
- 📊EvaluationDeflection rate
In support agents: the percentage of customer contacts the agent resolves fully without escalating to a human.
- 🚀DeploymentDense retrieval
The standard modern retrieval approach where queries and documents are encoded as dense embedding vectors and matched by similarity — distinct from sparse retrieval (BM25, keyword search).
- 💼BusinessDigital worker
A persistent agent that occupies a named role within a team — has a job description, KPIs, access to specific tools, and is managed alongside human teammates. The 2026 enterprise framing of agent deployment.
- 🏗️ArchitectureDPO
Direct Preference Optimization — a simpler alternative to RLHF that trains models directly on preference data without needing a separate reward model or reinforcement learning loop.
- 🔌ToolingDSPy
A Stanford-built framework that treats LLM prompts as compilable programs — define what you want declaratively, DSPy optimizes the prompts and few-shot examples automatically.
- 🚀DeploymentEdge AI
AI that runs on the device where data is generated — phone, laptop, IoT, vehicle, factory floor — rather than in a remote data center. Trades model size for latency, privacy, and offline operation.
- 🧰CapabilitiesEmbedding model
A neural network specialized for converting text (or images, audio) into fixed-length dense vectors — used for semantic search, RAG, clustering, and similarity tasks.
- 🏗️ArchitectureEmbeddings
Dense numerical vector representations of text, images, or audio — used to measure semantic similarity, power search, and ground LLM outputs in your data.
- 🏗️ArchitectureEmergent abilities
Capabilities that appear suddenly above a certain model scale — chain-of-thought reasoning, in-context learning, instruction following — and are absent or near-zero in smaller models.
- 🧰CapabilitiesEpisodic memory
The agent's memory of specific past events and sessions — "what happened when" — usually stored as timestamped summaries that can be retrieved by time, topic, or participant.
- 📊EvaluationEU AI Act
The European Union's regulatory framework for AI systems — categorizes AI by risk level (prohibited, high-risk, limited risk, minimal risk) and imposes obligations based on category. Phased into force 2024–2027.
- 📊EvaluationEval
A systematic test that measures agent performance on a fixed set of inputs — the agent equivalent of a test suite.
- 📊EvaluationFaithfulness
The RAG eval metric that scores whether the answer's claims are supported by the retrieved context — the standard RAGAS metric and a near-synonym for groundedness.
- 🏗️ArchitectureFew-shot learning
A prompting technique where the LLM sees a small number of input/output examples in the prompt before being asked to perform the same task on a new input.
- 🏗️ArchitectureFine-tuning
The process of training a pre-trained LLM on additional data to adapt it for a specific task, domain, or style — produces a specialized model derived from a general-purpose base.
- 💼BusinessFreemium
A pricing model where the agent has a useful free tier with paid plans for higher usage, more features, or commercial use.
- 🏗️ArchitectureFrontier model
The current generation of state-of-the-art LLMs — typically the largest models from OpenAI, Anthropic, Google, and a small number of others.
- 🔌ToolingFunction calling
An LLM API feature that lets the model emit a structured JSON call to a developer-defined function — the model picks the function name and arguments; the runtime executes the call.
- 📊EvaluationGAIA benchmark
A 466-question benchmark from Meta + Hugging Face that tests general-purpose AI assistants on real-world tasks requiring web browsing, file handling, and multi-step reasoning.
- 🏗️ArchitectureGraph of Thoughts
A reasoning structure that generalizes Tree of Thoughts to an arbitrary DAG — intermediate thoughts can be combined, refined, or referenced from multiple branches.
- 📊EvaluationGroundedness
A RAG eval metric measuring whether the generated response is supported by the retrieved context. Distinct from factual accuracy — the answer could be grounded in a wrong source.
- 📊EvaluationGuardrails (AI)
Constraints and filters layered around an LLM that prevent it from producing harmful, off-topic, or policy-violating outputs — applied at input, output, or both.
- 📊EvaluationHallucination
When an LLM generates content that sounds plausible but is factually wrong or fabricated — a citation that doesn't exist, a function that isn't in the API.
- 🧭AutonomyHierarchical agent
A multi-agent architecture where a "manager" or "planner" agent delegates sub-tasks to specialist worker agents — the most common multi-agent pattern in 2026 production systems.
- 🧭AutonomyHuman in the loop
A workflow pattern where the agent pauses for human approval at one or more checkpoints before continuing.
- 📊EvaluationHumanEval
A code-generation benchmark from OpenAI: 164 Python programming problems with unit tests, used to measure an LLM's ability to generate correct code from a natural-language description.
- 🚀DeploymentHybrid search
A retrieval technique that combines vector (semantic) search with keyword (lexical) search, fusing the scores to get higher precision than either alone. The 2026 production-grade default for RAG.
- 🧰CapabilitiesImage generation
The broader category of AI-generated images — includes text-to-image, image-to-image (editing), inpainting, outpainting, and style transfer. Powered by diffusion models or transformer-based image generators.
- 🏗️ArchitectureIn-context learning
An LLM's ability to learn a new task at inference time by reading examples in the prompt — no weight updates, just pattern-matching from context.
- 🏗️ArchitectureInference
The process of running a trained LLM to produce outputs — the production phase, distinct from training. Inference is what you pay for when you use an LLM API.
- 🚀DeploymentInference-time compute
Spending more compute at inference (longer reasoning chains, multiple samples, search) to improve quality on hard problems — the architectural bet of 2025–2026 reasoning models.
- 🏗️ArchitectureInstruction tuning
A fine-tuning technique where a pre-trained LLM is trained on instruction-response pairs so it learns to follow natural-language commands instead of just predicting next tokens.
- 📊EvaluationJailbreak (AI)
A prompting technique that bypasses an LLM's safety guardrails to make it produce content the model was trained to refuse.
- 🔌ToolingJSON mode
An LLM API setting that constrains the output to syntactically valid JSON — and, with strict mode, to a specific schema. The simplest reliable path to structured output.
- 🔌ToolingKV cache
A transformer inference optimization that stores key/value attention tensors from previous tokens so they do not need to be recomputed on every new token.
- 🔌ToolingLangChain
The original Python/TypeScript framework for building LLM applications — provides abstractions for chains, agents, tool use, memory, and retrieval. In 2026, mostly superseded by LangGraph for new projects.
- 🔌ToolingLangGraph
An open-source framework from LangChain for building stateful, multi-step agent applications as graphs — nodes are agent steps, edges define control flow, and state persists across steps.
- 🔌ToolingLlamaIndex
A Python framework focused on RAG and data-augmented LLM applications — provides indexing, retrieval, and query pipelines for connecting LLMs to your data.
- 📊EvaluationLLM as a judge
An evaluation pattern where a stronger LLM scores another LLM's outputs — replacing or supplementing human review when exact-match grading is infeasible.
- 🔌ToolingLLM gateway
A specific kind of AI gateway focused on LLM API calls — provides a unified interface to multiple LLM providers (OpenAI, Anthropic, Google, Mistral, etc.) with routing, caching, and observability.
- 📊EvaluationLLM observability
The practice of monitoring, tracing, and debugging LLM-powered systems in production — capturing prompts, completions, latency, cost, and errors across every call.
- 🔌Toolingllms.txt
A proposed convention (like robots.txt) for sites to tell LLMs which content to ingest, in what summary form, and on what terms. Adoption growing through 2025–2026.
- 🚀DeploymentLocal LLM
A large language model running entirely on hardware you control — your laptop, your server, or your data center — with no calls to external APIs.
- 🧰CapabilitiesLong-term memory
The agent's memory that survives across sessions, sometimes across months or years — usually a vector store plus a key-value store, with episodic, semantic, and procedural layers underneath.
- 🏗️ArchitectureLoRA
Low-Rank Adaptation — a parameter-efficient fine-tuning technique that updates a small number of additional weights instead of the full model, cutting compute and storage cost by 100×+ with minimal accuracy loss.
- 🔌ToolingMCP server
A process that exposes tools, resources, or prompts over the Model Context Protocol — any MCP-compliant agent can connect to it and use what it exposes.
- 🧰CapabilitiesMemory
The mechanism by which an agent remembers information across sessions — usually a vector store or structured key-value cache.
- 🏗️ArchitectureMixture of Agents
An architecture where multiple agents (often using different models) generate candidate responses, then an aggregator agent synthesizes them. Higher quality at higher cost.
- 🏗️ArchitectureMixture of experts
A model architecture where multiple specialized expert networks share the work — a routing layer activates only a few experts per input, cutting inference cost while keeping total parameter count high.
- 📊EvaluationMMLU
Massive Multitask Language Understanding — a 57-subject multiple-choice benchmark spanning STEM, humanities, social sciences, law, and ethics. The default measure of "general knowledge" for LLMs since 2020.
- 📊EvaluationModel card
A short structured document published with an AI model — declares intended uses, training data overview, performance across subgroups, known limitations, and risk factors.
- 🔌ToolingModel Context Protocol (MCP)
Model Context Protocol (MCP) is an open standard that lets any AI agent connect to any tool or data source through a single protocol — solving the M×N integration problem for the agent ecosystem.
- 🏗️ArchitectureModel distillation
A training technique that transfers knowledge from a large "teacher" model to a smaller "student" model by training the student to match the teacher's outputs — produces a faster, cheaper model that retains most of the teacher's capability.
- 🔌ToolingModel router
A component that selects which LLM to use for each request — based on cost, latency, capability, or content classification. Sits inside an AI gateway or as a standalone routing layer.
- 🚀DeploymentModel serving
The infrastructure layer that hosts a model and exposes inference over HTTP — covering batching, scheduling, KV-cache management, and request routing.
- 📊EvaluationMT-Bench
A multi-turn conversation benchmark where models are judged by a strong "LLM-as-judge" on 80 open-ended questions across writing, reasoning, math, coding, and roleplay.
- 🧰CapabilitiesMulti-agent
An architecture where several specialized agents collaborate on the same task — each handles a sub-goal and they coordinate through a shared workspace.
- 🏗️ArchitectureMulti-step reasoning
The ability of an LLM or agent to chain multiple inferences together to solve a problem — answer A leads to question B, which leads to question C, and so on until the final answer.
- 🧰CapabilitiesMultimodal AI
AI systems that process and reason across multiple input types — text, images, audio, video — within a single model, instead of routing each modality through separate specialized models.
- 🧰CapabilitiesNatural language understanding (NLU)
The AI subfield focused on extracting meaning from human language — intent classification, entity extraction, sentiment analysis, and semantic interpretation. In 2026, mostly subsumed by LLMs.
- 🏗️ArchitectureNeural network
A computational model loosely inspired by biological neurons — layers of weighted nodes that transform inputs to outputs. LLMs are large neural networks; so are image classifiers, recommendation systems, and most modern AI.
- 🚀DeploymentNo-code AI
AI tools that let non-engineers build agents, workflows, or applications via visual interfaces — drag-and-drop, prompts, or declarative configuration instead of writing code.
- 🧰CapabilitiesOCR (Optical Character Recognition)
Technology that extracts text from images, scanned documents, and PDFs — in 2026, OCR is often built into multimodal LLMs (Claude, GPT-4o, Gemini) rather than requiring a separate service.
- 🚀DeploymentOn-prem
A deployment where the agent runs entirely on infrastructure the customer controls — no agent code or customer data leaves the customer's network.
- 🚀DeploymentOpen source agent
An agent whose source code is publicly licensed (MIT, Apache, AGPL) — you can self-host, fork, and audit.
- 🔌ToolingOpenAI Agents SDK
OpenAI's 2025 production framework for building agents — successor to the older Assistants API, with first-class handoffs, guardrails, tracing, and tool use.
- 💼BusinessOutcome-based pricing
A pricing model where the vendor charges per successful outcome — closed ticket, qualified lead, resolved bug — rather than per seat, per task, or per token. The signature 2026 agent pricing pattern.
- 🔌ToolingParallel tool calling
A model capability where the LLM returns multiple tool calls in a single response — the agent runtime executes them concurrently rather than serially, cutting latency on independent operations.
- 💼BusinessPer-task pricing
A pricing model where you pay per completed task — per PR generated, per ticket resolved, per email drafted — rather than per seat or per month.
- 🏗️ArchitecturePlan-and-execute
A canonical two-stage agent pattern: a planner LLM produces a structured multi-step plan, then an executor (often a cheaper model) carries out each step using tools.
- 🏗️ArchitecturePlanning
The phase where an agent decomposes a goal into a structured sequence of sub-tasks before executing any of them.
- 🚀DeploymentPrivate inference
Running LLM inference inside your security perimeter (VPC, on-prem, confidential compute) so prompts and outputs never leave your control. Mandatory for regulated industries.
- 🔌ToolingPrompt caching
A vendor-side optimization that reuses computation for shared prompt prefixes across requests — billed at a 75–90% discount compared to fresh prompt tokens.
- 🏗️ArchitecturePrompt engineering
The practice of designing, refining, and testing the text instructions sent to an LLM to maximize output quality — covers system prompts, few-shot examples, formatting, and meta-instructions.
- 📊EvaluationPrompt injection
An attack where malicious instructions are smuggled into an LLM's input — through user prompts, web pages, documents, or tool outputs — causing the agent to ignore its real instructions.
- 🔌ToolingPrompt templates
Parameterized prompt patterns stored as reusable, version-controlled assets — the basic abstraction for managing prompts at production scale.
- 🔌ToolingPrompt versioning
The practice of treating system prompts as first-class code — versioned, tested, and deployed through CI/CD instead of edited inline in source files.
- 🔌ToolingPydantic AI
A Python agent framework from the Pydantic team — type-safe agents with structured outputs, model-agnostic, and a thin API designed to feel like FastAPI for LLMs.
- 🏗️ArchitectureQuantization
A technique that reduces model weights from 16-bit or 32-bit floats to smaller representations (8-bit, 4-bit, or lower), cutting memory use and inference cost by 2–8× with minimal accuracy loss.
- 🧰CapabilitiesRAG
Retrieval-augmented generation — pulling relevant documents from a knowledge base before generating, so the LLM grounds its answer in your data.
- 📊EvaluationRAGAS
An open-source RAG evaluation framework — the de facto standard in 2026 for measuring faithfulness, answer-relevance, context-precision, and context-recall.
- 🏗️ArchitectureReAct agent
An agent built on the ReAct pattern: an interleaved loop of reasoning (the model thinks out loud) and acting (the model calls a tool), repeated until the goal is met.
- 🏗️ArchitectureReasoning model
A class of LLM (o3, Claude Sonnet 4.6, Gemini 2.5 reasoning) that produces a long internal chain of thought before responding — trading latency for accuracy on hard problems.
- 📊EvaluationRed teaming
A structured testing practice where adversaries actively try to break an AI system — finding jailbreaks, hallucinations, harmful outputs, or unsafe tool calls before attackers do.
- 🏗️ArchitectureReflexion
An agent design pattern where the agent reflects on its previous attempts, generates a critique, and uses the critique to improve subsequent attempts — produces measurable accuracy gains on hard tasks.
- 🧰CapabilitiesReranker
A second-stage retrieval model that re-scores a small set of candidate results from initial retrieval — using a cross-encoder or LLM to produce more accurate final rankings.
- 🏗️ArchitectureRLHF
Reinforcement Learning from Human Feedback — a training technique where humans rate model outputs to teach the model which responses are preferred, dramatically improving instruction-following and safety.
- 🏗️ArchitectureScaling laws
Empirical power-law relationships between model size, training data, and compute that predict loss and capability — the basis for every major frontier-model training plan since 2020.
- 💼BusinessSeat-based pricing
The classic SaaS pricing model where customers pay per active user — common for copilot-style products (Cursor, GitHub Copilot, Notion AI) but eroding for autonomous agents.
- 🏗️ArchitectureSelf-consistency
A reasoning technique where the model samples multiple chain-of-thought traces for the same problem and selects the most common final answer — cheap accuracy boost on math and logic tasks.
- 🧰CapabilitiesSelf-correction
An agent capability where the model evaluates its own output for errors and produces a corrected version — improves accuracy on verifiable tasks by 10–30% in 2026.
- 🏗️ArchitectureSelf-RAG
A RAG variant where the model decides on the fly whether to retrieve, what to retrieve, and whether its own draft is grounded — emitting reflection tokens at each step.
- 🧰CapabilitiesSelf-reflection
An agent capability where the model generates an explicit reflection on its own reasoning or outputs — used to improve subsequent steps or detect errors before they propagate.
- 🔌ToolingSemantic cache
A cache layer that matches incoming prompts to past prompts by embedding similarity rather than exact match — serves stored responses for paraphrased queries.
- 🚀DeploymentSemantic chunking
A document-splitting technique that uses embeddings to detect semantic boundaries — produces more coherent chunks for RAG than fixed-size chunking, improving retrieval quality.
- 🧰CapabilitiesSemantic memory
The agent's store of timeless facts — "the user is a VP of Sales at Acme," "the company uses Snowflake" — distinct from events (episodic) or skills (procedural).
- 🚀DeploymentSemantic routing
A routing technique that uses embedding similarity to send each request to the right model, agent, or workflow — instead of brittle keyword rules or expensive LLM classifiers.
- 🧰CapabilitiesSemantic search
Search that ranks results by meaning rather than keyword overlap — using vector embeddings or LLM reasoning to match queries with conceptually similar content.
- 🧭AutonomySemi-autonomous agent
An agent that plans and executes most steps unsupervised but pauses for approval before anything irreversible.
- 🚀DeploymentSGLang
An LLM inference and programming framework optimized for structured generation, agent workloads, and complex prompting patterns — competitive with vLLM on throughput and faster on JSON/grammar-constrained output.
- 💼BusinessShadow AI
AI tools that employees use at work without IT or security approval. The 2026 successor to "shadow IT" — broader, faster-spreading, and harder to govern.
- 🚀DeploymentSmall language model
A capable LLM in the 1B–13B parameter range — trained to compete with frontier-quality on specific tasks while running on consumer hardware or at fraction-of-frontier cost.
- 🏗️ArchitectureSpeculative decoding
An inference optimization where a small draft model proposes multiple tokens at once and the large model verifies them in parallel — same output, 2–4× faster.
- 🧰CapabilitiesSpeech-to-text (STT)
AI technology that converts spoken audio into written text — also called Automatic Speech Recognition (ASR). The input half of voice AI, distinct from TTS which produces speech.
- 🚀DeploymentStreaming inference
Serving LLM outputs token-by-token as they're generated, typically over SSE or WebSocket — the default deployment pattern for any user-facing AI in 2026.
- 🔌ToolingStructured output
A model feature that constrains output to a specific JSON schema, making LLM responses safely parseable by downstream code.
- 💼BusinessSubscription pricing
A flat-rate monthly price per user — the dominant pricing model for agents aimed at individual contributors.
- 🧭AutonomySupervisor agent
In hierarchical multi-agent systems, the top-level agent that delegates work to specialist sub-agents, monitors progress, handles failures, and aggregates results.
- 🧭AutonomySwarm intelligence
A multi-agent pattern where many similar agents collaborate without a central supervisor — inspired by ant colonies and bee swarms, used for parallel exploration and consensus.
- 📊EvaluationSWE-bench
A benchmark from Princeton that tests coding agents on real GitHub issues — given the bug report and repo, the agent must produce a patch that passes the project's tests.
- 🔌ToolingSystem prompt
The initial instruction text given to an LLM that sets its persona, tools, constraints, and default behavior for the session.
- 🏗️ArchitectureTask decomposition
The agent reasoning step where a high-level goal is broken into ordered, executable sub-tasks before any tool call is made. Foundational to plan-and-execute, ReAct, and tree-of-thoughts patterns.
- 🚀DeploymentTCO
Total cost of ownership — the all-in cost of running an agent including subscription, token spend, ops time, and integration work.
- 🏗️ArchitectureTemperature
The LLM sampling parameter that controls randomness — low values stay near the most-likely next token, high values explore the distribution. 0–2 typical range.
- 🏗️ArchitectureTest-time compute
Spending more compute at inference time — longer reasoning, more samples, search — to get higher accuracy without retraining the model.
- 🧰CapabilitiesText-to-image
AI technology that generates images from text prompts — Midjourney, DALL-E 3, Stable Diffusion 3.5, Flux, and Ideogram are the 2026 leaders.
- 🧰CapabilitiesText-to-speech (TTS)
AI technology that converts written text into natural-sounding spoken audio — the synthesis half of voice AI, distinct from STT which goes the other direction.
- 🧰CapabilitiesText-to-video
AI technology that generates video clips from text prompts — Runway Gen-4, OpenAI Sora, Google Veo, and Kling are the 2026 leaders. Output is typically 5–30 seconds.
- 🏗️ArchitectureTokenization
The process of splitting text into subword units (tokens) that LLMs consume — a word like "tokenization" might become two or three tokens depending on the model's tokenizer.
- 🧰CapabilitiesTool use
The ability of an LLM to invoke external functions — APIs, shell commands, internal services — instead of just generating text.
- 🏗️ArchitectureTop-p sampling
A sampling strategy that picks from the smallest set of tokens whose cumulative probability exceeds p. Trims the long tail without hard top-k cutoff.
- 🏗️ArchitectureTransformer
The neural network architecture that powers every modern LLM — uses self-attention to process sequences in parallel, replacing the older RNN approach for language modeling.
- 🏗️ArchitectureTree of thoughts
A reasoning pattern where the LLM explores multiple solution paths in parallel as a tree, evaluates partial paths, and backtracks — outperforming linear chain-of-thought on hard problems.
- 💼BusinessUsage-based pricing
A pricing model where the customer pays for what they actually use — typically tokens, tool calls, compute minutes, or active agent hours — with no minimum seat commitment.
- 🧰CapabilitiesVector database
A database optimized for storing and querying high-dimensional embedding vectors — enabling semantic search, RAG, and similarity-based retrieval at scale.
- 🧰CapabilitiesVector embedding
A dense numerical vector representation of text, image, or audio — produced by an embedding model and used to measure semantic similarity in high-dimensional space.
- 🧰CapabilitiesVector search
Search powered by vector similarity — finding the K nearest embedding vectors to a query vector, typically using approximate-nearest-neighbor algorithms like HNSW or IVF.
- 🔌ToolingVertex AI Agents
Google Cloud's managed agent platform inside Vertex AI — visual builder, code SDK, native Gemini models, and A2A-protocol-first orchestration.
- 🧰CapabilitiesVibe coding
A 2025–2026 term for coding by describing what you want in natural language and letting AI generate the code — popularized by tools like Lovable, Bolt, Cursor Composer, and Replit Agent.
- 🧰CapabilitiesVision
An agent capability for understanding images, screenshots, and video — letting the model reason over visual content.
- 🚀DeploymentvLLM
A high-throughput open-source LLM inference engine — pioneered PagedAttention to manage KV cache like virtual memory, dramatically improving GPU utilization for serving open models.
- 🧰CapabilitiesVoice
An agent capability for taking phone calls, holding spoken conversations, and triggering actions from voice input.
- 🧰CapabilitiesVoice agent
An AI agent that takes phone calls, holds spoken conversations in real time, and triggers actions from voice input — handling customer support, scheduling, and outbound calling.
- 🧰CapabilitiesVoice cloning
AI technology that creates a synthetic voice indistinguishable from a target speaker — typically trained on 30 seconds to a few minutes of clean source audio.
- 📊EvaluationWebArena
A benchmark of realistic web-task scenarios (e-commerce, social, content management) where agents are scored on completing real multi-step user goals through a real browser.
- 🧰CapabilitiesWorking memory
Short-lived, task-scoped memory the agent uses to track the current goal, plan, and intermediate results — analogous to a human scratchpad during a single problem.
- 🏗️ArchitectureWorld model
An internal predictive representation of the environment that an agent uses to simulate the outcomes of candidate actions before acting — central to 2026 frontier-agent research.
- 🏗️ArchitectureZero-shot learning
An LLM's ability to perform a task it has never been explicitly trained on or shown examples of — relying entirely on the model's pre-training and the prompt instructions.