🔌

Tooling terms

Protocols and primitives agents use to talk to the outside world.

🔌Tooling
A2A protocol
Google's 2025 open Agent2Agent protocol — a standard for agents from different vendors to discover each other, exchange tasks, and stream results.
🔌Tooling
Agent protocols
The umbrella term for the standards agents use to communicate with tools (MCP) and with each other (A2A) — the connective-tissue layer of the 2026 agent ecosystem.
🔌Tooling
AI gateway
A middleware layer that sits between your application and LLM APIs — handles routing, fallback, caching, rate limiting, cost tracking, and observability across multiple model providers.
🔌Tooling
AI pipeline
A multi-step data processing flow that includes one or more LLM or AI calls — typically combines preprocessing, retrieval, LLM inference, post-processing, and observability into a single deployable unit.
🔌Tooling
AutoGen
Microsoft's open-source framework for multi-agent conversation — agents talk to each other to solve problems collaboratively, with explicit support for code execution and human-in-the-loop.
🔌Tooling
Bedrock Agents
AWS's managed agent service inside Amazon Bedrock — provides agent orchestration, tool integration via OpenAPI/Lambda, and a knowledge-base layer for RAG out of the box.
🔌Tooling
Context window
The maximum number of tokens a model can consider at once — covers the system prompt, conversation, tool results, and the answer being generated.
🔌Tooling
CrewAI
An open-source Python framework for role-based multi-agent systems — define agents with roles, goals, and tools, then orchestrate them into "crews" that collaborate on tasks.
🔌Tooling
DSPy
A Stanford-built framework that treats LLM prompts as compilable programs — define what you want declaratively, DSPy optimizes the prompts and few-shot examples automatically.
🔌Tooling
Function calling
An LLM API feature that lets the model emit a structured JSON call to a developer-defined function — the model picks the function name and arguments; the runtime executes the call.
🔌Tooling
JSON mode
An LLM API setting that constrains the output to syntactically valid JSON — and, with strict mode, to a specific schema. The simplest reliable path to structured output.
🔌Tooling
KV cache
A transformer inference optimization that stores key/value attention tensors from previous tokens so they do not need to be recomputed on every new token.
🔌Tooling
LangChain
The original Python/TypeScript framework for building LLM applications — provides abstractions for chains, agents, tool use, memory, and retrieval. In 2026, mostly superseded by LangGraph for new projects.
🔌Tooling
LangGraph
An open-source framework from LangChain for building stateful, multi-step agent applications as graphs — nodes are agent steps, edges define control flow, and state persists across steps.
🔌Tooling
LlamaIndex
A Python framework focused on RAG and data-augmented LLM applications — provides indexing, retrieval, and query pipelines for connecting LLMs to your data.
🔌Tooling
LLM gateway
A specific kind of AI gateway focused on LLM API calls — provides a unified interface to multiple LLM providers (OpenAI, Anthropic, Google, Mistral, etc.) with routing, caching, and observability.
🔌Tooling
llms.txt
A proposed convention (like robots.txt) for sites to tell LLMs which content to ingest, in what summary form, and on what terms. Adoption growing through 2025–2026.
🔌Tooling
MCP server
A process that exposes tools, resources, or prompts over the Model Context Protocol — any MCP-compliant agent can connect to it and use what it exposes.
🔌Tooling
Model Context Protocol (MCP)
Model Context Protocol (MCP) is an open standard that lets any AI agent connect to any tool or data source through a single protocol — solving the M×N integration problem for the agent ecosystem.
🔌Tooling
Model router
A component that selects which LLM to use for each request — based on cost, latency, capability, or content classification. Sits inside an AI gateway or as a standalone routing layer.
🔌Tooling
OpenAI Agents SDK
OpenAI's 2025 production framework for building agents — successor to the older Assistants API, with first-class handoffs, guardrails, tracing, and tool use.
🔌Tooling
Parallel tool calling
A model capability where the LLM returns multiple tool calls in a single response — the agent runtime executes them concurrently rather than serially, cutting latency on independent operations.
🔌Tooling
Prompt caching
A vendor-side optimization that reuses computation for shared prompt prefixes across requests — billed at a 75–90% discount compared to fresh prompt tokens.
🔌Tooling
Prompt templates
Parameterized prompt patterns stored as reusable, version-controlled assets — the basic abstraction for managing prompts at production scale.
🔌Tooling
Prompt versioning
The practice of treating system prompts as first-class code — versioned, tested, and deployed through CI/CD instead of edited inline in source files.
🔌Tooling
Pydantic AI
A Python agent framework from the Pydantic team — type-safe agents with structured outputs, model-agnostic, and a thin API designed to feel like FastAPI for LLMs.
🔌Tooling
Semantic cache
A cache layer that matches incoming prompts to past prompts by embedding similarity rather than exact match — serves stored responses for paraphrased queries.
🔌Tooling
Structured output
A model feature that constrains output to a specific JSON schema, making LLM responses safely parseable by downstream code.
🔌Tooling
System prompt
The initial instruction text given to an LLM that sets its persona, tools, constraints, and default behavior for the session.
🔌Tooling
Vertex AI Agents
Google Cloud's managed agent platform inside Vertex AI — visual builder, code SDK, native Gemini models, and A2A-protocol-first orchestration.