Tooling terms
Protocols and primitives agents use to talk to the outside world.
- 🔌ToolingA2A protocol
Google's 2025 open Agent2Agent protocol — a standard for agents from different vendors to discover each other, exchange tasks, and stream results.
- 🔌ToolingAgent protocols
The umbrella term for the standards agents use to communicate with tools (MCP) and with each other (A2A) — the connective-tissue layer of the 2026 agent ecosystem.
- 🔌ToolingAI gateway
A middleware layer that sits between your application and LLM APIs — handles routing, fallback, caching, rate limiting, cost tracking, and observability across multiple model providers.
- 🔌ToolingAI pipeline
A multi-step data processing flow that includes one or more LLM or AI calls — typically combines preprocessing, retrieval, LLM inference, post-processing, and observability into a single deployable unit.
- 🔌ToolingAutoGen
Microsoft's open-source framework for multi-agent conversation — agents talk to each other to solve problems collaboratively, with explicit support for code execution and human-in-the-loop.
- 🔌ToolingBedrock Agents
AWS's managed agent service inside Amazon Bedrock — provides agent orchestration, tool integration via OpenAPI/Lambda, and a knowledge-base layer for RAG out of the box.
- 🔌ToolingContext window
The maximum number of tokens a model can consider at once — covers the system prompt, conversation, tool results, and the answer being generated.
- 🔌ToolingCrewAI
An open-source Python framework for role-based multi-agent systems — define agents with roles, goals, and tools, then orchestrate them into "crews" that collaborate on tasks.
- 🔌ToolingDSPy
A Stanford-built framework that treats LLM prompts as compilable programs — define what you want declaratively, DSPy optimizes the prompts and few-shot examples automatically.
- 🔌ToolingFunction calling
An LLM API feature that lets the model emit a structured JSON call to a developer-defined function — the model picks the function name and arguments; the runtime executes the call.
- 🔌ToolingJSON mode
An LLM API setting that constrains the output to syntactically valid JSON — and, with strict mode, to a specific schema. The simplest reliable path to structured output.
- 🔌ToolingKV cache
A transformer inference optimization that stores key/value attention tensors from previous tokens so they do not need to be recomputed on every new token.
- 🔌ToolingLangChain
The original Python/TypeScript framework for building LLM applications — provides abstractions for chains, agents, tool use, memory, and retrieval. In 2026, mostly superseded by LangGraph for new projects.
- 🔌ToolingLangGraph
An open-source framework from LangChain for building stateful, multi-step agent applications as graphs — nodes are agent steps, edges define control flow, and state persists across steps.
- 🔌ToolingLlamaIndex
A Python framework focused on RAG and data-augmented LLM applications — provides indexing, retrieval, and query pipelines for connecting LLMs to your data.
- 🔌ToolingLLM gateway
A specific kind of AI gateway focused on LLM API calls — provides a unified interface to multiple LLM providers (OpenAI, Anthropic, Google, Mistral, etc.) with routing, caching, and observability.
- 🔌Toolingllms.txt
A proposed convention (like robots.txt) for sites to tell LLMs which content to ingest, in what summary form, and on what terms. Adoption growing through 2025–2026.
- 🔌ToolingMCP server
A process that exposes tools, resources, or prompts over the Model Context Protocol — any MCP-compliant agent can connect to it and use what it exposes.
- 🔌ToolingModel Context Protocol (MCP)
Model Context Protocol (MCP) is an open standard that lets any AI agent connect to any tool or data source through a single protocol — solving the M×N integration problem for the agent ecosystem.
- 🔌ToolingModel router
A component that selects which LLM to use for each request — based on cost, latency, capability, or content classification. Sits inside an AI gateway or as a standalone routing layer.
- 🔌ToolingOpenAI Agents SDK
OpenAI's 2025 production framework for building agents — successor to the older Assistants API, with first-class handoffs, guardrails, tracing, and tool use.
- 🔌ToolingParallel tool calling
A model capability where the LLM returns multiple tool calls in a single response — the agent runtime executes them concurrently rather than serially, cutting latency on independent operations.
- 🔌ToolingPrompt caching
A vendor-side optimization that reuses computation for shared prompt prefixes across requests — billed at a 75–90% discount compared to fresh prompt tokens.
- 🔌ToolingPrompt templates
Parameterized prompt patterns stored as reusable, version-controlled assets — the basic abstraction for managing prompts at production scale.
- 🔌ToolingPrompt versioning
The practice of treating system prompts as first-class code — versioned, tested, and deployed through CI/CD instead of edited inline in source files.
- 🔌ToolingPydantic AI
A Python agent framework from the Pydantic team — type-safe agents with structured outputs, model-agnostic, and a thin API designed to feel like FastAPI for LLMs.
- 🔌ToolingSemantic cache
A cache layer that matches incoming prompts to past prompts by embedding similarity rather than exact match — serves stored responses for paraphrased queries.
- 🔌ToolingStructured output
A model feature that constrains output to a specific JSON schema, making LLM responses safely parseable by downstream code.
- 🔌ToolingSystem prompt
The initial instruction text given to an LLM that sets its persona, tools, constraints, and default behavior for the session.
- 🔌ToolingVertex AI Agents
Google Cloud's managed agent platform inside Vertex AI — visual builder, code SDK, native Gemini models, and A2A-protocol-first orchestration.