Vector Space: Meaning as Direction | Explainers

Read this as What does distance or direction in embedding space buy you?

Failure Trap: Treating the famous analogy arithmetic as exact proof instead of an intuition for learned directions.
Decision Rule: Use neighborhoods for similarity, directions for coarse concepts, and measure retrieval on real queries.

1 / ?

Embeddings place items as vectors

An embedding model maps tokens, chunks, or documents to dense numeric vectors.

Each vector is a point in high-dimensional space.
The coordinates are learned features.
Humans inspect projections, not the full space.

Similar meanings form neighborhoods

Items used in similar contexts tend to land near each other, so clusters become a rough map of meaning.

Nearby does not mean identical.
Clusters depend on the training signal.
Projection can distort the full geometry.

Directions can encode concepts

Differences between vectors can point along learned concept directions such as tense, gender, or domain.

Directions are statistical regularities.
They are useful but not universal laws.
The idea helps explain vector operations.

Analogy arithmetic is illustrative

The classic king minus man plus woman example shows the intuition, but real systems should not rely on exact analogy math.

Offsets can reveal structure.
They also fail outside clean examples.
Treat them as mental model, not contract.

Cosine similarity ranks neighbors

Retrieval often ranks chunks by cosine similarity, which compares vector direction more than raw length.

The query becomes a vector too.
Nearest chunks become candidates.
High similarity still needs relevance checks.

Vector space powers semantic retrieval

RAG uses embedding neighborhoods to fetch text that may answer a query even when the words differ.

Embeddings handle paraphrase.
BM25 still helps exact strings.
The retrieved text grounds generation.