Embedding model
A neural network specialized for converting text (or images, audio) into fixed-length dense vectors — used for semantic search, RAG, clustering, and similarity tasks.
Embedding models are the unsung workhorses of every RAG stack. While the generation LLM gets the spotlight, the embedding model is what determines retrieval quality — and retrieval quality bounds the entire system's output.
The 2026 production options: OpenAI text-embedding-3 family (small at 1536 dims, large at 3072), Cohere Embed v3 (multilingual strength), Voyage AI (RAG-optimized), and open-source BGE / E5 families (self-hostable, competitive quality). Pick by language coverage, retrieval task type, latency, and cost — not by claimed benchmark wins.
A common upgrade path that pays back fast: switch from a generic embedding model to a domain-tuned one. Voyage AI, Cohere, and the open BGE family all offer task-specific variants (code embeddings, legal embeddings, finance embeddings) that beat generic embeddings on their domain by 10–20% on retrieval recall.
Frequently asked
OpenAI vs Cohere vs Voyage vs BGE — which embedding model?+
OpenAI for English-only with simple integration. Cohere for multilingual production stacks. Voyage for RAG-specialized retrieval (often best for domain-specific). BGE / E5 for self-hosted with frontier-competitive quality.
Should I fine-tune an embedding model?+
Rarely worth it. Switching to a stronger off-the-shelf model usually beats fine-tuning a weaker one. Consider fine-tuning only when (a) you have 10K+ labeled query-document pairs and (b) off-the-shelf options have plateaued.