Summary of how Word2Vec operates:
1. **Build the Vocabulary:** It scans the text to create a **Vocabulary ($V$)** of all unique words. Every word is initially represented as a massive, empty $V$-dimensional "one-hot" vector (all zeros and a single `1` at its unique index).
2. **Set the Target:** It sets a dense vector size **$N$** (usually 100–300 dimensions) and sets up a weight matrix (**Matrix $W$**, sized $V \times N$) containing one row for every word in the vocabulary.
3. **Slide and Predict:** A sliding window moves across the text. Using an MLP neural network, it passes a target word's index through Matrix $W$ to guess its surrounding context words (Skip-gram), or vice versa (CBOW).
4. **Update Weights:** The network compares its guess to the actual words in the text, calculates the error, and uses backpropagation to tweak the weights in Matrix $W$. Words used in similar contexts are pulled closer together in the $N$-dimensional space.
5. **Extract the Result:** Once training finishes, the neural network's prediction layers are thrown away. Matrix $W$ is kept as the final lookup table, where every word in the **Vocabulary** maps to a meaningful, dense $N$-dimensional vector.