Fine-tuning
The process of training a pre-trained LLM on additional data to adapt it for a specific task, domain, or style — produces a specialized model derived from a general-purpose base.
Fine-tuning takes a foundation model (Llama, Qwen, Mistral, or a frontier vendor model) and trains it further on your data. The result is a model that performs better on your specific task — domain-specific jargon, custom formats, brand voice, or specialized reasoning patterns.
In 2026, most teams do not fine-tune. RAG, prompt engineering, and frontier-model APIs cover 90% of use cases. Fine-tuning makes sense when (a) the task has many examples (1K+ labeled), (b) prompt-only approaches plateau, (c) you need to fit a smaller model for inference cost, or (d) you need behavior changes that cross-cut every response.
Parameter-efficient methods like [LoRA](/glossary/lora) make fine-tuning practical without retraining all weights. For most production stacks, LoRA on Llama or Qwen is the cost-effective path.
Frequently asked
When should I fine-tune vs use RAG?+
RAG when the knowledge changes often or is too large to fit in context. Fine-tuning when you want to change behavior, style, or domain reasoning patterns. Most production stacks use both: fine-tuned model + RAG-retrieved context.
How much data do I need to fine-tune?+
For style or format adaptation, 500–2K examples is enough. For domain capability changes, 5K–50K. Below 500 examples, prompt engineering with examples usually beats fine-tuning.