GenAI Systems Lab Open interactive version →
Foundations & Architecture 7 min read

Embeddings Explained: How Text Becomes Geometry

What embedding vectors represent, why semantic similarity works, and how this underpins every RAG system and search product.

An embedding is a point in high-dimensional space. That's the whole idea. Every word, sentence, document, or image your model processes gets mapped to a vector of floats — and the geometry of that space encodes meaning.

This is not a metaphor. Two semantically similar sentences will literally be closer together in embedding space than two dissimilar ones, measured by cosine similarity or dot product. Every RAG system, every semantic search, every recommendation engine depends on this property.

[Video: 3Blue1Brown — But what is a neural network? (visual foundation for embeddings and representations)]

What is an embedding vector?

An embedding is the output of an encoder model when you pass it some text. For sentence-transformers like all-MiniLM-L6-v2, this is a 384-dimensional vector. For OpenAI text-embedding-3-large, it's 3072 dimensions. For most production RAG, 768–1536 dimensions is standard.

Two vectors are "similar" if the angle between them is small — measured by cosine similarity: cos(θ) = (A·B) / (|A||B|). Score of 1 = identical direction. Score of 0 = unrelated. Score of -1 = opposite meaning.

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

a = model.encode("How do I reset my password?")
b = model.encode("I forgot my login credentials")
c = model.encode("The weather is nice today")

def cosine_sim(x, y):
    return np.dot(x, y) / (np.linalg.norm(x) * np.linalg.norm(y))

print(cosine_sim(a, b))  # ~0.85 — very similar
print(cosine_sim(a, c))  # ~0.05 — unrelated

How embeddings are trained

Embedding models are trained using contrastive learning. You feed in pairs of text: (similar, similar) and (similar, dissimilar). The model learns to pull similar pairs together in vector space and push dissimilar pairs apart. The most common objective is the InfoNCE loss.

OpenAI's text-embedding models are trained on hundreds of millions of (query, passage) pairs from the web. Sentence-transformers fine-tune BERT-style models on NLI and semantic textual similarity datasets.

Embedding models vs. LLMs

PropertyEmbedding modelLLM
OutputFixed-size vectorVariable-length text
Use caseSimilarity, retrieval, clusteringGeneration, reasoning
CostVery cheap (~$0.0001/1K tokens)10-100× more expensive
Latency5-20ms200ms–10s
Examplestext-embedding-3, GTE, BGEGPT-4, Claude, Gemini

Why this matters for RAG

In a RAG pipeline, your documents are pre-embedded and stored in a vector database. At query time, you embed the user's question and find the nearest document chunks. The quality of your embedding model directly determines retrieval quality — and retrieval quality is the single biggest determinant of RAG answer quality.

Use the same embedding model for indexing and querying. If you index with text-embedding-3-small and query with text-embedding-3-large, your similarity scores will be meaningless — the vector spaces are different.

Choosing an embedding model

The limits of embeddings — and when they fail

Embeddings collapse nuance. "I love this product" and "I don't love this product" are cosine-similar in most embedding spaces because they share most of their tokens. Negation is semantically critical but geometrically invisible. Similarly, rare technical terms (specific CVE IDs, drug names, internal product codes) often embed poorly — there's no training signal to anchor them.

Don't use embeddings as your only retrieval layer for technical documentation with precise identifiers. Hybrid search (vector + keyword) handles both semantic meaning and exact matching. Vector-only will miss queries like 'CVE-2024-38473' or 'SKU-A84721' unless you've fine-tuned the embedding model on your corpus.

Visualise embedding space →: See how real text clusters in vector space using dimensionality reduction in the Explore module.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →