212 termes · 7 catégories

Le glossaire des agents IA.

Définitions claires en français pour les termes que vous voyez constamment — autonomie, capacités, architectures, tooling et tarification.

🧭Autonomy 🧰Capabilities 🏗️Architecture 🔌Tooling 🚀Deployment 📊Evaluation 💼Business

📚Tous les termes(212)

🔌Tooling
A2A protocol
Google's 2025 open Agent2Agent protocol — a standard for agents from different vendors to discover each other, exchange tasks, and stream results.
🏗️Architecture
Adaptive RAG
A RAG variant that routes queries to different retrieval strategies based on complexity — simple questions skip retrieval, hard ones get multi-hop retrieval.
🧭Autonomy
Agent
A software system powered by an LLM that perceives its environment, plans actions, and executes them — usually across multiple steps and tools.
🧰Capabilities
Agent memory
The persistent state an agent maintains across turns and sessions — covers short-term context, long-term facts, episodic events, and procedural skills. Distinct from the LLM context window.
📊Evaluation
Agent observability
Specialized observability for AI agents — tracing the agent's reasoning, tool calls, sub-agent communication, state changes, and decision points across a multi-step run.
🏗️Architecture
Agent orchestration
The control layer that coordinates multiple agents or agent steps — routing work, managing state, enforcing hand-off rules, and resolving conflicts between specialized agents.
🔌Tooling
Agent protocols
The umbrella term for the standards agents use to communicate with tools (MCP) and with each other (A2A) — the connective-tissue layer of the 2026 agent ecosystem.
📊Evaluation
Agent sandbox
An isolated execution environment — usually a container, microVM, or browser profile — where an agent can run code, browse, and act without affecting the host system or shared state.
📊Evaluation
AgentBench
A multi-environment benchmark suite for LLM-as-agent performance — covers OS, database, web shopping, knowledge graph, card game, and lateral-thinking tasks across 8 environments.
🧭Autonomy
Agentic AI
AI systems that act with autonomy — perceiving their environment, planning multi-step actions, calling tools, and iterating toward a goal — as opposed to single-turn generative AI that only responds to prompts.
🧰Capabilities
Agentic browser
A web browser where an AI agent is a first-class user — given a goal, the browser plans, navigates, clicks, and fills forms across pages. Arc Search, Dia, and Comet are 2026 examples.
🧰Capabilities
Agentic commerce
The emerging pattern where an AI agent — not a human — researches, compares, and transacts on behalf of the user. 2026 standards from Visa, Mastercard, and Stripe are formalizing it.
🏗️Architecture
Agentic loop
The core control flow of an agent: observe → reason → act → observe, repeated until the goal is met or a stop condition fires.
🧰Capabilities
Agentic RAG
A retrieval pattern where the agent decides when and what to retrieve — issuing its own search queries, refining them, and iterating — instead of a single up-front retrieval step.
🧰Capabilities
Agentic search
A search pattern where an agent — not a single retrieval call — runs the query: it plans, queries multiple sources, evaluates results, refines, and returns a synthesized answer with citations.
🧭Autonomy
Agentic workflow
A workflow where an AI agent plans, executes, and adapts a multi-step process — autonomously calling tools, reading data, and looping until the goal is met.
🚀Deployment
AI agent framework
A library or toolkit for building AI agents — providing primitives for tool calling, planning, memory, and orchestration so you do not rebuild the agent loop from scratch.
📊Evaluation
AI alignment
The research and engineering practice of ensuring AI systems pursue the goals their designers intend — covering training-time techniques like RLHF and constitutional AI as well as deployment-time guardrails.
💼Business
AI BDR
An AI business development representative — an autonomous sales agent focused on outbound prospecting, top-of-funnel qualification, and pipeline generation, often used interchangeably with AI SDR.
📊Evaluation
AI bias
Systematic errors in AI outputs that disadvantage specific groups, perspectives, or topics — caused by biased training data, biased reward signals, or biased evaluation criteria.
🧰Capabilities
AI citations
AI outputs that include verifiable links or references to source documents — the trust primitive that separates research-grade AI from pure generative chat.
🧰Capabilities
AI code review
An agent that reviews pull requests — reads the diff, finds bugs, flags style and security issues, and posts inline comments. GitHub Copilot Code Review, CodeRabbit, Greptile, and Cursor BugBot lead the 2026 market.
📊Evaluation
AI content moderation
The classifier and policy layer that filters input to and output from an LLM agent — blocks unsafe categories (CSAM, self-harm, malware), enforces brand voice, and flags PII.
🧰Capabilities
AI data analyst
An agent that connects to data warehouses, runs SQL, builds charts, and produces narrative analyses — replacing the "Slack-to-the-analytics-team" loop for routine business questions.
🚀Deployment
AI drift
The phenomenon where an AI system's behavior changes over time without explicit code changes — caused by model version updates, training data shifts, or vendor-side changes.
💼Business
AI employee
Marketing-grade synonym for a digital worker — an agent positioned as a hireable, role-shaped teammate. Notable 2024–2026 examples: 11x's Alice, Artisan's Ava, Devin from Cognition.
📊Evaluation
AI evals
Systematic test suites for AI systems — input/expected-output pairs run automatically to catch regressions when models or prompts change.
🔌Tooling
AI gateway
A middleware layer that sits between your application and LLM APIs — handles routing, fallback, caching, rate limiting, cost tracking, and observability across multiple model providers.
📊Evaluation
AI governance
The framework of policies, controls, and review processes that ensure AI systems are deployed safely, ethically, and in compliance with regulation — covers risk management, audit trails, and stakeholder accountability.
🧭Autonomy
AI handoff
The transition pattern where an AI agent transfers control of a task to a human or to another agent — preserving context, state, and prior actions so the receiver can continue seamlessly.
💼Business
AI maturity model
A staged framework describing organizational AI evolution — typically experimenting → piloting → scaling → optimizing → transforming. Used to plan investment + measure progress.
🧰Capabilities
AI meeting assistant
An agent that joins meetings (or processes recordings) to transcribe, summarize, extract action items, and follow up — Otter, Fireflies, Granola, and Read.ai are 2026 leaders.
🧰Capabilities
AI Operator
OpenAI's browser-based autonomous agent that takes a task ("book me a flight to SFO next Tuesday") and completes it by driving a real web browser — clicking, typing, navigating, and confirming.
🧭Autonomy
AI pair programming
A workflow where an engineer codes alongside an AI assistant that suggests, completes, and reviews code in real time — distinct from autonomous coding agents that ship PRs without human intervention.
🚀Deployment
AI pilot
A time-boxed, scope-limited deployment of an AI agent against a real workflow to measure quality, cost, and adoption before broader rollout. The standard 2026 enterprise procurement pattern.
🔌Tooling
AI pipeline
A multi-step data processing flow that includes one or more LLM or AI calls — typically combines preprocessing, retrieval, LLM inference, post-processing, and observability into a single deployable unit.
💼Business
AI readiness
An assessment of an organization's preparedness to deploy AI productively — covering data infrastructure, talent, governance, and use-case maturity.
🧰Capabilities
AI research agent
An agent that takes a research question, searches multiple sources over multiple rounds, synthesizes a sourced report, and follows up with clarifications. Distinct from a search box.
💼Business
AI ROI
The business return generated by an AI deployment minus its full cost — model spend, infra, integration, change management, and risk. The 2026 procurement north-star metric.
📊Evaluation
AI safety
The research and engineering discipline focused on making AI systems behave reliably, refuse harmful requests, and fail gracefully under unexpected inputs — covering both training-time alignment and deployment-time guardrails.
💼Business
AI SDR
An AI sales development representative — an autonomous agent that handles prospecting, outbound email sequencing, and lead qualification end-to-end.
🧰Capabilities
AI streaming
Sending model output to the user token-by-token as it generates, instead of waiting for the full response. The default UX pattern for AI chat in 2026.
📊Evaluation
AI watermarking
Techniques that embed a detectable signal in AI-generated text, images, audio, or video so downstream systems can identify content as machine-generated.
💼Business
AI workforce
The collective fleet of AI agents and digital workers an organization runs — managed as a unit with shared governance, shared identity, shared observability, and a unified cost model.
📊Evaluation
Answer relevance
The RAG eval metric that scores whether the answer actually addresses the user's question. Catches the "perfectly grounded but useless" failure mode.
📊Evaluation
ARC-AGI
François Chollet's benchmark for measuring fluid intelligence — agents must induce a transformation rule from a few input/output grid examples and apply it. Designed to resist memorization.
🧰Capabilities
Artifact
A UI pattern where AI-generated content (code, documents, diagrams) renders in a separate panel beside the chat, so users can edit and iterate without losing the conversation.
🏗️Architecture
Attention mechanism
The neural-network primitive that lets a transformer model weigh the importance of every input token when generating each output token — the core innovation behind LLMs.
🔌Tooling
AutoGen
Microsoft's open-source framework for multi-agent conversation — agents talk to each other to solve problems collaboratively, with explicit support for code execution and human-in-the-loop.
🏗️Architecture
AutoGPT
The 2023 open-source project that popularized autonomous LLM agents — wraps an LLM in a recursive plan-execute-reflect loop with persistent goals and tool use.
🧭Autonomy
Autonomous agent
An agent that plans, executes, and finishes a multi-step task without asking for human approval between steps.
🏗️Architecture
BabyAGI
A minimal 2023 reference implementation of an autonomous task-driven agent — three loops (task creation, prioritization, execution) in ~100 lines of Python.
🚀Deployment
Batch inference
Running model inference asynchronously over a large batch of inputs, traded for latency. OpenAI/Anthropic batch APIs are typically 50% cheaper than sync calls.
🔌Tooling
Bedrock Agents
AWS's managed agent service inside Amazon Bedrock — provides agent orchestration, tool integration via OpenAPI/Lambda, and a knowledge-base layer for RAG out of the box.
📊Evaluation
Benchmark
A publicly-shared, standardized eval suite used to compare models and agents across a uniform task — SWE-bench, MMLU, GAIA, etc.
🧰Capabilities
Browser agent
An AI agent specialized in driving a web browser — navigating sites, filling forms, scraping data, and completing multi-step web workflows on behalf of the user.
🧰Capabilities
Browser use
An agent capability where the LLM drives a real web browser to read, click, and fill forms on live websites.
🚀Deployment
BYO key
A deployment pattern where you supply your own model API key to the agent — token costs are billed to you directly, the agent vendor charges only for the software.
🧰Capabilities
Canvas
OpenAI's side-panel editing surface for documents and code generated by ChatGPT — the OpenAI equivalent of Claude's Artifacts.
🏗️Architecture
Chain of Agents
An architecture where agents run in sequence, each refining or extending the previous agent's output. Used for long documents or multi-stage workflows.
🏗️Architecture
Chain of thought
A prompting technique that asks the model to lay out its reasoning step-by-step before committing to an answer — improves accuracy on multi-step tasks.
📊Evaluation
Citation quality
An eval metric for systems that cite sources — measures whether citations resolve to real documents, point to the supporting passage, and match the cited claim.
🧰Capabilities
Code execution
An agent capability for writing and running code in a sandboxed environment — usually Python — to compute, transform data, or test hypotheses.
🧰Capabilities
Coding agent
An AI agent specialized in software engineering tasks — reading codebases, writing code, running tests, opening pull requests, and fixing bugs.
🧰Capabilities
Computer use
An agent capability where the LLM controls a computer's mouse, keyboard, and screen directly — interpreting screenshots, clicking, typing, and navigating arbitrary desktop and browser apps.
🧰Capabilities
Computer vision
The AI field focused on letting machines understand images and video — covers object detection, image classification, segmentation, OCR, scene understanding, and more.
🏗️Architecture
Constitutional AI
An alignment technique developed by Anthropic where the model is trained to follow a written set of principles ("a constitution") rather than per-example human preferences — produces safer behavior without massive human-labeling effort.
🏗️Architecture
Context engineering
The discipline of curating what information goes into an LLM's context window — selecting, ordering, and formatting the system prompt, examples, retrieved documents, and conversation history for maximum effectiveness.
🔌Tooling
Context window
The maximum number of tokens a model can consider at once — covers the system prompt, conversation, tool results, and the answer being generated.
🧭Autonomy
Copilot
An AI tool that suggests changes inline and waits for the user to accept — the human stays in the driver's seat.
🏗️Architecture
Corrective RAG
A RAG variant that grades retrieved documents and triggers fallback retrieval (web search, alternative sources) when the initial retrieval scores low on relevance.
💼Business
Cost per task
The fully-loaded cost of an AI completing one unit of work — model spend + infrastructure + integration cost amortized + retries. The right unit for AI ROI math.
🔌Tooling
CrewAI
An open-source Python framework for role-based multi-agent systems — define agents with roles, goals, and tools, then orchestrate them into "crews" that collaborate on tasks.
🧰Capabilities
Deep research
An agent capability that produces long-form, multi-source research reports by autonomously browsing the web, reading documents, and synthesizing findings — typically running for 5–30 minutes per query.
📊Evaluation
Deflection rate
In support agents: the percentage of customer contacts the agent resolves fully without escalating to a human.
🚀Deployment
Dense retrieval
The standard modern retrieval approach where queries and documents are encoded as dense embedding vectors and matched by similarity — distinct from sparse retrieval (BM25, keyword search).
💼Business
Digital worker
A persistent agent that occupies a named role within a team — has a job description, KPIs, access to specific tools, and is managed alongside human teammates. The 2026 enterprise framing of agent deployment.
🏗️Architecture
DPO
Direct Preference Optimization — a simpler alternative to RLHF that trains models directly on preference data without needing a separate reward model or reinforcement learning loop.
🔌Tooling
DSPy
A Stanford-built framework that treats LLM prompts as compilable programs — define what you want declaratively, DSPy optimizes the prompts and few-shot examples automatically.
🚀Deployment
Edge AI
AI that runs on the device where data is generated — phone, laptop, IoT, vehicle, factory floor — rather than in a remote data center. Trades model size for latency, privacy, and offline operation.
🧰Capabilities
Embedding model
A neural network specialized for converting text (or images, audio) into fixed-length dense vectors — used for semantic search, RAG, clustering, and similarity tasks.
🏗️Architecture
Embeddings
Dense numerical vector representations of text, images, or audio — used to measure semantic similarity, power search, and ground LLM outputs in your data.
🏗️Architecture
Emergent abilities
Capabilities that appear suddenly above a certain model scale — chain-of-thought reasoning, in-context learning, instruction following — and are absent or near-zero in smaller models.
🧰Capabilities
Episodic memory
The agent's memory of specific past events and sessions — "what happened when" — usually stored as timestamped summaries that can be retrieved by time, topic, or participant.
📊Evaluation
EU AI Act
The European Union's regulatory framework for AI systems — categorizes AI by risk level (prohibited, high-risk, limited risk, minimal risk) and imposes obligations based on category. Phased into force 2024–2027.
📊Evaluation
Eval
A systematic test that measures agent performance on a fixed set of inputs — the agent equivalent of a test suite.
📊Evaluation
Faithfulness
The RAG eval metric that scores whether the answer's claims are supported by the retrieved context — the standard RAGAS metric and a near-synonym for groundedness.
🏗️Architecture
Few-shot learning
A prompting technique where the LLM sees a small number of input/output examples in the prompt before being asked to perform the same task on a new input.
🏗️Architecture
Fine-tuning
The process of training a pre-trained LLM on additional data to adapt it for a specific task, domain, or style — produces a specialized model derived from a general-purpose base.
💼Business
Freemium
A pricing model where the agent has a useful free tier with paid plans for higher usage, more features, or commercial use.
🏗️Architecture
Frontier model
The current generation of state-of-the-art LLMs — typically the largest models from OpenAI, Anthropic, Google, and a small number of others.
🔌Tooling
Function calling
An LLM API feature that lets the model emit a structured JSON call to a developer-defined function — the model picks the function name and arguments; the runtime executes the call.
📊Evaluation
GAIA benchmark
A 466-question benchmark from Meta + Hugging Face that tests general-purpose AI assistants on real-world tasks requiring web browsing, file handling, and multi-step reasoning.
🏗️Architecture
Graph of Thoughts
A reasoning structure that generalizes Tree of Thoughts to an arbitrary DAG — intermediate thoughts can be combined, refined, or referenced from multiple branches.
📊Evaluation
Groundedness
A RAG eval metric measuring whether the generated response is supported by the retrieved context. Distinct from factual accuracy — the answer could be grounded in a wrong source.
📊Evaluation
Guardrails (AI)
Constraints and filters layered around an LLM that prevent it from producing harmful, off-topic, or policy-violating outputs — applied at input, output, or both.
📊Evaluation
Hallucination
When an LLM generates content that sounds plausible but is factually wrong or fabricated — a citation that doesn't exist, a function that isn't in the API.
🧭Autonomy
Hierarchical agent
A multi-agent architecture where a "manager" or "planner" agent delegates sub-tasks to specialist worker agents — the most common multi-agent pattern in 2026 production systems.
🧭Autonomy
Human in the loop
A workflow pattern where the agent pauses for human approval at one or more checkpoints before continuing.
📊Evaluation
HumanEval
A code-generation benchmark from OpenAI: 164 Python programming problems with unit tests, used to measure an LLM's ability to generate correct code from a natural-language description.
🚀Deployment
Hybrid search
A retrieval technique that combines vector (semantic) search with keyword (lexical) search, fusing the scores to get higher precision than either alone. The 2026 production-grade default for RAG.
🧰Capabilities
Image generation
The broader category of AI-generated images — includes text-to-image, image-to-image (editing), inpainting, outpainting, and style transfer. Powered by diffusion models or transformer-based image generators.
🏗️Architecture
In-context learning
An LLM's ability to learn a new task at inference time by reading examples in the prompt — no weight updates, just pattern-matching from context.
🏗️Architecture
Inference
The process of running a trained LLM to produce outputs — the production phase, distinct from training. Inference is what you pay for when you use an LLM API.
🚀Deployment
Inference-time compute
Spending more compute at inference (longer reasoning chains, multiple samples, search) to improve quality on hard problems — the architectural bet of 2025–2026 reasoning models.
🏗️Architecture
Instruction tuning
A fine-tuning technique where a pre-trained LLM is trained on instruction-response pairs so it learns to follow natural-language commands instead of just predicting next tokens.
📊Evaluation
Jailbreak (AI)
A prompting technique that bypasses an LLM's safety guardrails to make it produce content the model was trained to refuse.
🔌Tooling
JSON mode
An LLM API setting that constrains the output to syntactically valid JSON — and, with strict mode, to a specific schema. The simplest reliable path to structured output.
🔌Tooling
KV cache
A transformer inference optimization that stores key/value attention tensors from previous tokens so they do not need to be recomputed on every new token.
🔌Tooling
LangChain
The original Python/TypeScript framework for building LLM applications — provides abstractions for chains, agents, tool use, memory, and retrieval. In 2026, mostly superseded by LangGraph for new projects.
🔌Tooling
LangGraph
An open-source framework from LangChain for building stateful, multi-step agent applications as graphs — nodes are agent steps, edges define control flow, and state persists across steps.
🔌Tooling
LlamaIndex
A Python framework focused on RAG and data-augmented LLM applications — provides indexing, retrieval, and query pipelines for connecting LLMs to your data.
📊Evaluation
LLM as a judge
An evaluation pattern where a stronger LLM scores another LLM's outputs — replacing or supplementing human review when exact-match grading is infeasible.
🔌Tooling
LLM gateway
A specific kind of AI gateway focused on LLM API calls — provides a unified interface to multiple LLM providers (OpenAI, Anthropic, Google, Mistral, etc.) with routing, caching, and observability.
📊Evaluation
LLM observability
The practice of monitoring, tracing, and debugging LLM-powered systems in production — capturing prompts, completions, latency, cost, and errors across every call.
🔌Tooling
llms.txt
A proposed convention (like robots.txt) for sites to tell LLMs which content to ingest, in what summary form, and on what terms. Adoption growing through 2025–2026.
🚀Deployment
Local LLM
A large language model running entirely on hardware you control — your laptop, your server, or your data center — with no calls to external APIs.
🧰Capabilities
Long-term memory
The agent's memory that survives across sessions, sometimes across months or years — usually a vector store plus a key-value store, with episodic, semantic, and procedural layers underneath.
🏗️Architecture
LoRA
Low-Rank Adaptation — a parameter-efficient fine-tuning technique that updates a small number of additional weights instead of the full model, cutting compute and storage cost by 100×+ with minimal accuracy loss.
🔌Tooling
MCP server
A process that exposes tools, resources, or prompts over the Model Context Protocol — any MCP-compliant agent can connect to it and use what it exposes.
🧰Capabilities
Memory
The mechanism by which an agent remembers information across sessions — usually a vector store or structured key-value cache.
🏗️Architecture
Mixture of Agents
An architecture where multiple agents (often using different models) generate candidate responses, then an aggregator agent synthesizes them. Higher quality at higher cost.
🏗️Architecture
Mixture of experts
A model architecture where multiple specialized expert networks share the work — a routing layer activates only a few experts per input, cutting inference cost while keeping total parameter count high.
📊Evaluation
MMLU
Massive Multitask Language Understanding — a 57-subject multiple-choice benchmark spanning STEM, humanities, social sciences, law, and ethics. The default measure of "general knowledge" for LLMs since 2020.
📊Evaluation
Model card
A short structured document published with an AI model — declares intended uses, training data overview, performance across subgroups, known limitations, and risk factors.
🔌Tooling
Model Context Protocol (MCP)
Model Context Protocol (MCP) is an open standard that lets any AI agent connect to any tool or data source through a single protocol — solving the M×N integration problem for the agent ecosystem.
🏗️Architecture
Model distillation
A training technique that transfers knowledge from a large "teacher" model to a smaller "student" model by training the student to match the teacher's outputs — produces a faster, cheaper model that retains most of the teacher's capability.
🔌Tooling
Model router
A component that selects which LLM to use for each request — based on cost, latency, capability, or content classification. Sits inside an AI gateway or as a standalone routing layer.
🚀Deployment
Model serving
The infrastructure layer that hosts a model and exposes inference over HTTP — covering batching, scheduling, KV-cache management, and request routing.
📊Evaluation
MT-Bench
A multi-turn conversation benchmark where models are judged by a strong "LLM-as-judge" on 80 open-ended questions across writing, reasoning, math, coding, and roleplay.
🧰Capabilities
Multi-agent
An architecture where several specialized agents collaborate on the same task — each handles a sub-goal and they coordinate through a shared workspace.
🏗️Architecture
Multi-step reasoning
The ability of an LLM or agent to chain multiple inferences together to solve a problem — answer A leads to question B, which leads to question C, and so on until the final answer.
🧰Capabilities
Multimodal AI
AI systems that process and reason across multiple input types — text, images, audio, video — within a single model, instead of routing each modality through separate specialized models.
🧰Capabilities
Natural language understanding (NLU)
The AI subfield focused on extracting meaning from human language — intent classification, entity extraction, sentiment analysis, and semantic interpretation. In 2026, mostly subsumed by LLMs.
🏗️Architecture
Neural network
A computational model loosely inspired by biological neurons — layers of weighted nodes that transform inputs to outputs. LLMs are large neural networks; so are image classifiers, recommendation systems, and most modern AI.
🚀Deployment
No-code AI
AI tools that let non-engineers build agents, workflows, or applications via visual interfaces — drag-and-drop, prompts, or declarative configuration instead of writing code.
🧰Capabilities
OCR (Optical Character Recognition)
Technology that extracts text from images, scanned documents, and PDFs — in 2026, OCR is often built into multimodal LLMs (Claude, GPT-4o, Gemini) rather than requiring a separate service.
🚀Deployment
On-prem
A deployment where the agent runs entirely on infrastructure the customer controls — no agent code or customer data leaves the customer's network.
🚀Deployment
Open source agent
An agent whose source code is publicly licensed (MIT, Apache, AGPL) — you can self-host, fork, and audit.
🔌Tooling
OpenAI Agents SDK
OpenAI's 2025 production framework for building agents — successor to the older Assistants API, with first-class handoffs, guardrails, tracing, and tool use.
💼Business
Outcome-based pricing
A pricing model where the vendor charges per successful outcome — closed ticket, qualified lead, resolved bug — rather than per seat, per task, or per token. The signature 2026 agent pricing pattern.
🔌Tooling
Parallel tool calling
A model capability where the LLM returns multiple tool calls in a single response — the agent runtime executes them concurrently rather than serially, cutting latency on independent operations.
💼Business
Per-task pricing
A pricing model where you pay per completed task — per PR generated, per ticket resolved, per email drafted — rather than per seat or per month.
🏗️Architecture
Plan-and-execute
A canonical two-stage agent pattern: a planner LLM produces a structured multi-step plan, then an executor (often a cheaper model) carries out each step using tools.
🏗️Architecture
Planning
The phase where an agent decomposes a goal into a structured sequence of sub-tasks before executing any of them.
🚀Deployment
Private inference
Running LLM inference inside your security perimeter (VPC, on-prem, confidential compute) so prompts and outputs never leave your control. Mandatory for regulated industries.
🔌Tooling
Prompt caching
A vendor-side optimization that reuses computation for shared prompt prefixes across requests — billed at a 75–90% discount compared to fresh prompt tokens.
🏗️Architecture
Prompt engineering
The practice of designing, refining, and testing the text instructions sent to an LLM to maximize output quality — covers system prompts, few-shot examples, formatting, and meta-instructions.
📊Evaluation
Prompt injection
An attack where malicious instructions are smuggled into an LLM's input — through user prompts, web pages, documents, or tool outputs — causing the agent to ignore its real instructions.
🔌Tooling
Prompt templates
Parameterized prompt patterns stored as reusable, version-controlled assets — the basic abstraction for managing prompts at production scale.
🔌Tooling
Prompt versioning
The practice of treating system prompts as first-class code — versioned, tested, and deployed through CI/CD instead of edited inline in source files.
🔌Tooling
Pydantic AI
A Python agent framework from the Pydantic team — type-safe agents with structured outputs, model-agnostic, and a thin API designed to feel like FastAPI for LLMs.
🏗️Architecture
Quantization
A technique that reduces model weights from 16-bit or 32-bit floats to smaller representations (8-bit, 4-bit, or lower), cutting memory use and inference cost by 2–8× with minimal accuracy loss.
🧰Capabilities
RAG
Retrieval-augmented generation — pulling relevant documents from a knowledge base before generating, so the LLM grounds its answer in your data.
📊Evaluation
RAGAS
An open-source RAG evaluation framework — the de facto standard in 2026 for measuring faithfulness, answer-relevance, context-precision, and context-recall.
🏗️Architecture
ReAct agent
An agent built on the ReAct pattern: an interleaved loop of reasoning (the model thinks out loud) and acting (the model calls a tool), repeated until the goal is met.
🏗️Architecture
Reasoning model
A class of LLM (o3, Claude Sonnet 4.6, Gemini 2.5 reasoning) that produces a long internal chain of thought before responding — trading latency for accuracy on hard problems.
📊Evaluation
Red teaming
A structured testing practice where adversaries actively try to break an AI system — finding jailbreaks, hallucinations, harmful outputs, or unsafe tool calls before attackers do.
🏗️Architecture
Reflexion
An agent design pattern where the agent reflects on its previous attempts, generates a critique, and uses the critique to improve subsequent attempts — produces measurable accuracy gains on hard tasks.
🧰Capabilities
Reranker
A second-stage retrieval model that re-scores a small set of candidate results from initial retrieval — using a cross-encoder or LLM to produce more accurate final rankings.
🏗️Architecture
RLHF
Reinforcement Learning from Human Feedback — a training technique where humans rate model outputs to teach the model which responses are preferred, dramatically improving instruction-following and safety.
🏗️Architecture
Scaling laws
Empirical power-law relationships between model size, training data, and compute that predict loss and capability — the basis for every major frontier-model training plan since 2020.
💼Business
Seat-based pricing
The classic SaaS pricing model where customers pay per active user — common for copilot-style products (Cursor, GitHub Copilot, Notion AI) but eroding for autonomous agents.
🏗️Architecture
Self-consistency
A reasoning technique where the model samples multiple chain-of-thought traces for the same problem and selects the most common final answer — cheap accuracy boost on math and logic tasks.
🧰Capabilities
Self-correction
An agent capability where the model evaluates its own output for errors and produces a corrected version — improves accuracy on verifiable tasks by 10–30% in 2026.
🏗️Architecture
Self-RAG
A RAG variant where the model decides on the fly whether to retrieve, what to retrieve, and whether its own draft is grounded — emitting reflection tokens at each step.
🧰Capabilities
Self-reflection
An agent capability where the model generates an explicit reflection on its own reasoning or outputs — used to improve subsequent steps or detect errors before they propagate.
🔌Tooling
Semantic cache
A cache layer that matches incoming prompts to past prompts by embedding similarity rather than exact match — serves stored responses for paraphrased queries.
🚀Deployment
Semantic chunking
A document-splitting technique that uses embeddings to detect semantic boundaries — produces more coherent chunks for RAG than fixed-size chunking, improving retrieval quality.
🧰Capabilities
Semantic memory
The agent's store of timeless facts — "the user is a VP of Sales at Acme," "the company uses Snowflake" — distinct from events (episodic) or skills (procedural).
🚀Deployment
Semantic routing
A routing technique that uses embedding similarity to send each request to the right model, agent, or workflow — instead of brittle keyword rules or expensive LLM classifiers.
🧰Capabilities
Semantic search
Search that ranks results by meaning rather than keyword overlap — using vector embeddings or LLM reasoning to match queries with conceptually similar content.
🧭Autonomy
Semi-autonomous agent
An agent that plans and executes most steps unsupervised but pauses for approval before anything irreversible.
🚀Deployment
SGLang
An LLM inference and programming framework optimized for structured generation, agent workloads, and complex prompting patterns — competitive with vLLM on throughput and faster on JSON/grammar-constrained output.
💼Business
Shadow AI
AI tools that employees use at work without IT or security approval. The 2026 successor to "shadow IT" — broader, faster-spreading, and harder to govern.
🚀Deployment
Small language model
A capable LLM in the 1B–13B parameter range — trained to compete with frontier-quality on specific tasks while running on consumer hardware or at fraction-of-frontier cost.
🏗️Architecture
Speculative decoding
An inference optimization where a small draft model proposes multiple tokens at once and the large model verifies them in parallel — same output, 2–4× faster.
🧰Capabilities
Speech-to-text (STT)
AI technology that converts spoken audio into written text — also called Automatic Speech Recognition (ASR). The input half of voice AI, distinct from TTS which produces speech.
🚀Deployment
Streaming inference
Serving LLM outputs token-by-token as they're generated, typically over SSE or WebSocket — the default deployment pattern for any user-facing AI in 2026.
🔌Tooling
Structured output
A model feature that constrains output to a specific JSON schema, making LLM responses safely parseable by downstream code.
💼Business
Subscription pricing
A flat-rate monthly price per user — the dominant pricing model for agents aimed at individual contributors.
🧭Autonomy
Supervisor agent
In hierarchical multi-agent systems, the top-level agent that delegates work to specialist sub-agents, monitors progress, handles failures, and aggregates results.
🧭Autonomy
Swarm intelligence
A multi-agent pattern where many similar agents collaborate without a central supervisor — inspired by ant colonies and bee swarms, used for parallel exploration and consensus.
📊Evaluation
SWE-bench
A benchmark from Princeton that tests coding agents on real GitHub issues — given the bug report and repo, the agent must produce a patch that passes the project's tests.
🔌Tooling
System prompt
The initial instruction text given to an LLM that sets its persona, tools, constraints, and default behavior for the session.
🏗️Architecture
Task decomposition
The agent reasoning step where a high-level goal is broken into ordered, executable sub-tasks before any tool call is made. Foundational to plan-and-execute, ReAct, and tree-of-thoughts patterns.
🚀Deployment
TCO
Total cost of ownership — the all-in cost of running an agent including subscription, token spend, ops time, and integration work.
🏗️Architecture
Temperature
The LLM sampling parameter that controls randomness — low values stay near the most-likely next token, high values explore the distribution. 0–2 typical range.
🏗️Architecture
Test-time compute
Spending more compute at inference time — longer reasoning, more samples, search — to get higher accuracy without retraining the model.
🧰Capabilities
Text-to-image
AI technology that generates images from text prompts — Midjourney, DALL-E 3, Stable Diffusion 3.5, Flux, and Ideogram are the 2026 leaders.
🧰Capabilities
Text-to-speech (TTS)
AI technology that converts written text into natural-sounding spoken audio — the synthesis half of voice AI, distinct from STT which goes the other direction.
🧰Capabilities
Text-to-video
AI technology that generates video clips from text prompts — Runway Gen-4, OpenAI Sora, Google Veo, and Kling are the 2026 leaders. Output is typically 5–30 seconds.
🏗️Architecture
Tokenization
The process of splitting text into subword units (tokens) that LLMs consume — a word like "tokenization" might become two or three tokens depending on the model's tokenizer.
🧰Capabilities
Tool use
The ability of an LLM to invoke external functions — APIs, shell commands, internal services — instead of just generating text.
🏗️Architecture
Top-p sampling
A sampling strategy that picks from the smallest set of tokens whose cumulative probability exceeds p. Trims the long tail without hard top-k cutoff.
🏗️Architecture
Transformer
The neural network architecture that powers every modern LLM — uses self-attention to process sequences in parallel, replacing the older RNN approach for language modeling.
🏗️Architecture
Tree of thoughts
A reasoning pattern where the LLM explores multiple solution paths in parallel as a tree, evaluates partial paths, and backtracks — outperforming linear chain-of-thought on hard problems.
💼Business
Usage-based pricing
A pricing model where the customer pays for what they actually use — typically tokens, tool calls, compute minutes, or active agent hours — with no minimum seat commitment.
🧰Capabilities
Vector database
A database optimized for storing and querying high-dimensional embedding vectors — enabling semantic search, RAG, and similarity-based retrieval at scale.
🧰Capabilities
Vector embedding
A dense numerical vector representation of text, image, or audio — produced by an embedding model and used to measure semantic similarity in high-dimensional space.
🧰Capabilities
Vector search
Search powered by vector similarity — finding the K nearest embedding vectors to a query vector, typically using approximate-nearest-neighbor algorithms like HNSW or IVF.
🔌Tooling
Vertex AI Agents
Google Cloud's managed agent platform inside Vertex AI — visual builder, code SDK, native Gemini models, and A2A-protocol-first orchestration.
🧰Capabilities
Vibe coding
A 2025–2026 term for coding by describing what you want in natural language and letting AI generate the code — popularized by tools like Lovable, Bolt, Cursor Composer, and Replit Agent.
🧰Capabilities
Vision
An agent capability for understanding images, screenshots, and video — letting the model reason over visual content.
🚀Deployment
vLLM
A high-throughput open-source LLM inference engine — pioneered PagedAttention to manage KV cache like virtual memory, dramatically improving GPU utilization for serving open models.
🧰Capabilities
Voice
An agent capability for taking phone calls, holding spoken conversations, and triggering actions from voice input.
🧰Capabilities
Voice agent
An AI agent that takes phone calls, holds spoken conversations in real time, and triggers actions from voice input — handling customer support, scheduling, and outbound calling.
🧰Capabilities
Voice cloning
AI technology that creates a synthetic voice indistinguishable from a target speaker — typically trained on 30 seconds to a few minutes of clean source audio.
📊Evaluation
WebArena
A benchmark of realistic web-task scenarios (e-commerce, social, content management) where agents are scored on completing real multi-step user goals through a real browser.
🧰Capabilities
Working memory
Short-lived, task-scoped memory the agent uses to track the current goal, plan, and intermediate results — analogous to a human scratchpad during a single problem.
🏗️Architecture
World model
An internal predictive representation of the environment that an agent uses to simulate the outcomes of candidate actions before acting — central to 2026 frontier-agent research.
🏗️Architecture
Zero-shot learning
An LLM's ability to perform a task it has never been explicitly trained on or shown examples of — relying entirely on the model's pre-training and the prompt instructions.