Read this as What does distance or direction in embedding space buy you?
- Failure Trap
- Treating the famous analogy arithmetic as exact proof instead of an intuition for learned directions.
- Decision Rule
- Use neighborhoods for similarity, directions for coarse concepts, and measure retrieval on real queries.
Embeddings place items as vectors
An embedding model maps tokens, chunks, or documents to dense numeric vectors.
- Each vector is a point in high-dimensional space.
- The coordinates are learned features.
- Humans inspect projections, not the full space.
Similar meanings form neighborhoods
Items used in similar contexts tend to land near each other, so clusters become a rough map of meaning.
- Nearby does not mean identical.
- Clusters depend on the training signal.
- Projection can distort the full geometry.
Directions can encode concepts
Differences between vectors can point along learned concept directions such as tense, gender, or domain.
- Directions are statistical regularities.
- They are useful but not universal laws.
- The idea helps explain vector operations.
Analogy arithmetic is illustrative
The classic king minus man plus woman example shows the intuition, but real systems should not rely on exact analogy math.
- Offsets can reveal structure.
- They also fail outside clean examples.
- Treat them as mental model, not contract.
Cosine similarity ranks neighbors
Retrieval often ranks chunks by cosine similarity, which compares vector direction more than raw length.
- The query becomes a vector too.
- Nearest chunks become candidates.
- High similarity still needs relevance checks.
Vector space powers semantic retrieval
RAG uses embedding neighborhoods to fetch text that may answer a query even when the words differ.
- Embeddings handle paraphrase.
- BM25 still helps exact strings.
- The retrieved text grounds generation.