Dr.Jiw: Regularization

วันอาทิตย์ที่ 20 ตุลาคม พ.ศ. 2567

Regularization

Regularization is a technique used in statistical modeling and machine learning to prevent overfitting, which occurs when a model learns to perform very well on training data but fails to generalize to unseen data. Regularization techniques add a penalty to the loss function to constrain the model's complexity. Here are some common regularization techniques:

1. L1 Regularization (Lasso)

Description: Adds the absolute value of the coefficients as a penalty term to the loss function.
Effect: Encourages sparsity in the model by driving some coefficients to zero, effectively selecting a simpler model that uses fewer features.
Loss Function: $L = L_0 + \lambda \sum |w_i|$ $L = L_{0} + λ \sum ∣ w_{i} ∣$
- $L_0$ : original loss (e.g., mean squared error)
- $w_i$ : coefficients
- $\lambda$ : regularization parameter controlling the strength of the penalty.

2. L2 Regularization (Ridge)

Description: Adds the square of the coefficients as a penalty term to the loss function.
Effect: Tends to reduce the size of coefficients but does not set any to zero. It shrinks the weights more evenly across all features, making the model more stable.
Loss Function: $L = L_0 + \lambda \sum w_i^2$

3. Elastic Net Regularization

Description: Combines both L1 and L2 regularization. It can select features (like L1) while also encouraging smaller weights (like L2).
Effect: Useful when there are multiple features correlated with each other.
Loss Function: $L = L_0 + \lambda_1 \sum |w_i| + \lambda_2 \sum w_i^2$

4. Dropout

Description: A regularization technique specifically used in neural networks where randomly selected neurons are ignored (dropped out) during training.
Effect: Prevents co-adaptation of neurons, helping the network to generalize better by forcing it to learn robust features that are useful independently of others.

5. Early Stopping

Description: Involves monitoring the model's performance on a validation set during training and stopping the training process when performance starts to degrade (indicating overfitting).
Effect: Prevents the model from learning noise in the training data.

6. Data Augmentation

Description: Increasing the amount of training data by applying transformations (e.g., rotation, scaling, flipping) to existing data.
Effect: Helps the model generalize better by exposing it to various forms of data.

7. Weight Regularization

Description: Adding constraints on the weights (e.g., constraining the weights to lie within a certain range).
Effect: Helps in controlling model complexity and prevents overfitting.

8. Batch Normalization

Description: Normalizes the output of a layer to stabilize learning, effectively acting as a form of regularization.
Effect: Reduces internal covariate shift and can lead to faster training.

--ChatGPT