วันอาทิตย์ที่ 24 พฤษภาคม พ.ศ. 2569

Word2vec

Summary of how Word2Vec operates:

1. **Build the Vocabulary:** It scans the text to create a **Vocabulary ($V$)** of all unique words. Every word is initially represented as a massive, empty $V$-dimensional "one-hot" vector (all zeros and a single `1` at its unique index).

2. **Set the Target:** It sets a dense vector size **$N$** (usually 100–300 dimensions) and sets up a weight matrix (**Matrix $W$**, sized $V \times N$) containing one row for every word in the vocabulary.

3. **Slide and Predict:** A sliding window moves across the text. Using an MLP neural network, it passes a target word's index through Matrix $W$ to guess its surrounding context words (Skip-gram), or vice versa (CBOW).

4. **Update Weights:** The network compares its guess to the actual words in the text, calculates the error, and uses backpropagation to tweak the weights in Matrix $W$. Words used in similar contexts are pulled closer together in the $N$-dimensional space.

5. **Extract the Result:** Once training finishes, the neural network's prediction layers are thrown away. Matrix $W$ is kept as the final lookup table, where every word in the **Vocabulary** maps to a meaningful, dense $N$-dimensional vector.