Architecture terms
The reasoning patterns and loops that make an agent work.
- ๐๏ธArchitectureAdaptive RAG
A RAG variant that routes queries to different retrieval strategies based on complexity โ simple questions skip retrieval, hard ones get multi-hop retrieval.
- ๐๏ธArchitectureAgent orchestration
The control layer that coordinates multiple agents or agent steps โ routing work, managing state, enforcing hand-off rules, and resolving conflicts between specialized agents.
- ๐๏ธArchitectureAgentic loop
The core control flow of an agent: observe โ reason โ act โ observe, repeated until the goal is met or a stop condition fires.
- ๐๏ธArchitectureAttention mechanism
The neural-network primitive that lets a transformer model weigh the importance of every input token when generating each output token โ the core innovation behind LLMs.
- ๐๏ธArchitectureAutoGPT
The 2023 open-source project that popularized autonomous LLM agents โ wraps an LLM in a recursive plan-execute-reflect loop with persistent goals and tool use.
- ๐๏ธArchitectureBabyAGI
A minimal 2023 reference implementation of an autonomous task-driven agent โ three loops (task creation, prioritization, execution) in ~100 lines of Python.
- ๐๏ธArchitectureChain of Agents
An architecture where agents run in sequence, each refining or extending the previous agent's output. Used for long documents or multi-stage workflows.
- ๐๏ธArchitectureChain of thought
A prompting technique that asks the model to lay out its reasoning step-by-step before committing to an answer โ improves accuracy on multi-step tasks.
- ๐๏ธArchitectureConstitutional AI
An alignment technique developed by Anthropic where the model is trained to follow a written set of principles ("a constitution") rather than per-example human preferences โ produces safer behavior without massive human-labeling effort.
- ๐๏ธArchitectureContext engineering
The discipline of curating what information goes into an LLM's context window โ selecting, ordering, and formatting the system prompt, examples, retrieved documents, and conversation history for maximum effectiveness.
- ๐๏ธArchitectureCorrective RAG
A RAG variant that grades retrieved documents and triggers fallback retrieval (web search, alternative sources) when the initial retrieval scores low on relevance.
- ๐๏ธArchitectureDPO
Direct Preference Optimization โ a simpler alternative to RLHF that trains models directly on preference data without needing a separate reward model or reinforcement learning loop.
- ๐๏ธArchitectureEmbeddings
Dense numerical vector representations of text, images, or audio โ used to measure semantic similarity, power search, and ground LLM outputs in your data.
- ๐๏ธArchitectureEmergent abilities
Capabilities that appear suddenly above a certain model scale โ chain-of-thought reasoning, in-context learning, instruction following โ and are absent or near-zero in smaller models.
- ๐๏ธArchitectureFew-shot learning
A prompting technique where the LLM sees a small number of input/output examples in the prompt before being asked to perform the same task on a new input.
- ๐๏ธArchitectureFine-tuning
The process of training a pre-trained LLM on additional data to adapt it for a specific task, domain, or style โ produces a specialized model derived from a general-purpose base.
- ๐๏ธArchitectureFrontier model
The current generation of state-of-the-art LLMs โ typically the largest models from OpenAI, Anthropic, Google, and a small number of others.
- ๐๏ธArchitectureGraph of Thoughts
A reasoning structure that generalizes Tree of Thoughts to an arbitrary DAG โ intermediate thoughts can be combined, refined, or referenced from multiple branches.
- ๐๏ธArchitectureIn-context learning
An LLM's ability to learn a new task at inference time by reading examples in the prompt โ no weight updates, just pattern-matching from context.
- ๐๏ธArchitectureInference
The process of running a trained LLM to produce outputs โ the production phase, distinct from training. Inference is what you pay for when you use an LLM API.
- ๐๏ธArchitectureInstruction tuning
A fine-tuning technique where a pre-trained LLM is trained on instruction-response pairs so it learns to follow natural-language commands instead of just predicting next tokens.
- ๐๏ธArchitectureLoRA
Low-Rank Adaptation โ a parameter-efficient fine-tuning technique that updates a small number of additional weights instead of the full model, cutting compute and storage cost by 100ร+ with minimal accuracy loss.
- ๐๏ธArchitectureMixture of Agents
An architecture where multiple agents (often using different models) generate candidate responses, then an aggregator agent synthesizes them. Higher quality at higher cost.
- ๐๏ธArchitectureMixture of experts
A model architecture where multiple specialized expert networks share the work โ a routing layer activates only a few experts per input, cutting inference cost while keeping total parameter count high.
- ๐๏ธArchitectureModel distillation
A training technique that transfers knowledge from a large "teacher" model to a smaller "student" model by training the student to match the teacher's outputs โ produces a faster, cheaper model that retains most of the teacher's capability.
- ๐๏ธArchitectureMulti-step reasoning
The ability of an LLM or agent to chain multiple inferences together to solve a problem โ answer A leads to question B, which leads to question C, and so on until the final answer.
- ๐๏ธArchitectureNeural network
A computational model loosely inspired by biological neurons โ layers of weighted nodes that transform inputs to outputs. LLMs are large neural networks; so are image classifiers, recommendation systems, and most modern AI.
- ๐๏ธArchitecturePlan-and-execute
A canonical two-stage agent pattern: a planner LLM produces a structured multi-step plan, then an executor (often a cheaper model) carries out each step using tools.
- ๐๏ธArchitecturePlanning
The phase where an agent decomposes a goal into a structured sequence of sub-tasks before executing any of them.
- ๐๏ธArchitecturePrompt engineering
The practice of designing, refining, and testing the text instructions sent to an LLM to maximize output quality โ covers system prompts, few-shot examples, formatting, and meta-instructions.
- ๐๏ธArchitectureQuantization
A technique that reduces model weights from 16-bit or 32-bit floats to smaller representations (8-bit, 4-bit, or lower), cutting memory use and inference cost by 2โ8ร with minimal accuracy loss.
- ๐๏ธArchitectureReAct agent
An agent built on the ReAct pattern: an interleaved loop of reasoning (the model thinks out loud) and acting (the model calls a tool), repeated until the goal is met.
- ๐๏ธArchitectureReasoning model
A class of LLM (o3, Claude Sonnet 4.6, Gemini 2.5 reasoning) that produces a long internal chain of thought before responding โ trading latency for accuracy on hard problems.
- ๐๏ธArchitectureReflexion
An agent design pattern where the agent reflects on its previous attempts, generates a critique, and uses the critique to improve subsequent attempts โ produces measurable accuracy gains on hard tasks.
- ๐๏ธArchitectureRLHF
Reinforcement Learning from Human Feedback โ a training technique where humans rate model outputs to teach the model which responses are preferred, dramatically improving instruction-following and safety.
- ๐๏ธArchitectureScaling laws
Empirical power-law relationships between model size, training data, and compute that predict loss and capability โ the basis for every major frontier-model training plan since 2020.
- ๐๏ธArchitectureSelf-consistency
A reasoning technique where the model samples multiple chain-of-thought traces for the same problem and selects the most common final answer โ cheap accuracy boost on math and logic tasks.
- ๐๏ธArchitectureSelf-RAG
A RAG variant where the model decides on the fly whether to retrieve, what to retrieve, and whether its own draft is grounded โ emitting reflection tokens at each step.
- ๐๏ธArchitectureSpeculative decoding
An inference optimization where a small draft model proposes multiple tokens at once and the large model verifies them in parallel โ same output, 2โ4ร faster.
- ๐๏ธArchitectureTask decomposition
The agent reasoning step where a high-level goal is broken into ordered, executable sub-tasks before any tool call is made. Foundational to plan-and-execute, ReAct, and tree-of-thoughts patterns.
- ๐๏ธArchitectureTemperature
The LLM sampling parameter that controls randomness โ low values stay near the most-likely next token, high values explore the distribution. 0โ2 typical range.
- ๐๏ธArchitectureTest-time compute
Spending more compute at inference time โ longer reasoning, more samples, search โ to get higher accuracy without retraining the model.
- ๐๏ธArchitectureTokenization
The process of splitting text into subword units (tokens) that LLMs consume โ a word like "tokenization" might become two or three tokens depending on the model's tokenizer.
- ๐๏ธArchitectureTop-p sampling
A sampling strategy that picks from the smallest set of tokens whose cumulative probability exceeds p. Trims the long tail without hard top-k cutoff.
- ๐๏ธArchitectureTransformer
The neural network architecture that powers every modern LLM โ uses self-attention to process sequences in parallel, replacing the older RNN approach for language modeling.
- ๐๏ธArchitectureTree of thoughts
A reasoning pattern where the LLM explores multiple solution paths in parallel as a tree, evaluates partial paths, and backtracks โ outperforming linear chain-of-thought on hard problems.
- ๐๏ธArchitectureWorld model
An internal predictive representation of the environment that an agent uses to simulate the outcomes of candidate actions before acting โ central to 2026 frontier-agent research.
- ๐๏ธArchitectureZero-shot learning
An LLM's ability to perform a task it has never been explicitly trained on or shown examples of โ relying entirely on the model's pre-training and the prompt instructions.