Vector Space: Meaning as Direction

See embeddings as points, neighborhoods, directions, analogy-style offsets, and retrieval by cosine similarity.

Read this as What does distance or direction in embedding space buy you?
Failure Trap
Treating the famous analogy arithmetic as exact proof instead of an intuition for learned directions.
Decision Rule
Use neighborhoods for similarity, directions for coarse concepts, and measure retrieval on real queries.
Vector Space: Meaning as Direction See embeddings as points, neighborhoods, directions, analogy-style offsets, and retrieval by cosine similarity. Points Points word or doc dense vector high-dim point vector Clusters Clusters nearby points similar use rough meaning neighbors Direction Direction vector offset concept axis learned pattern delta Analogy Analogy king - man + woman near queen not exact Cosine Cosine query vector rank chunks nearest first similarity Retrieval Retrieval semantic match top-k chunks ground LLM RAG
1 / ?

Embeddings place items as vectors

An embedding model maps tokens, chunks, or documents to dense numeric vectors.

  • Each vector is a point in high-dimensional space.
  • The coordinates are learned features.
  • Humans inspect projections, not the full space.

Similar meanings form neighborhoods

Items used in similar contexts tend to land near each other, so clusters become a rough map of meaning.

  • Nearby does not mean identical.
  • Clusters depend on the training signal.
  • Projection can distort the full geometry.

Directions can encode concepts

Differences between vectors can point along learned concept directions such as tense, gender, or domain.

  • Directions are statistical regularities.
  • They are useful but not universal laws.
  • The idea helps explain vector operations.

Analogy arithmetic is illustrative

The classic king minus man plus woman example shows the intuition, but real systems should not rely on exact analogy math.

  • Offsets can reveal structure.
  • They also fail outside clean examples.
  • Treat them as mental model, not contract.

Cosine similarity ranks neighbors

Retrieval often ranks chunks by cosine similarity, which compares vector direction more than raw length.

  • The query becomes a vector too.
  • Nearest chunks become candidates.
  • High similarity still needs relevance checks.

Vector space powers semantic retrieval

RAG uses embedding neighborhoods to fetch text that may answer a query even when the words differ.

  • Embeddings handle paraphrase.
  • BM25 still helps exact strings.
  • The retrieved text grounds generation.