วันอาทิตย์ที่ 20 ตุลาคม พ.ศ. 2567

Regularization

Regularization is a technique used in statistical modeling and machine learning to prevent overfitting, which occurs when a model learns to perform very well on training data but fails to generalize to unseen data. Regularization techniques add a penalty to the loss function to constrain the model's complexity. Here are some common regularization techniques:

1. L1 Regularization (Lasso)

  • Description: Adds the absolute value of the coefficients as a penalty term to the loss function.
  • Effect: Encourages sparsity in the model by driving some coefficients to zero, effectively selecting a simpler model that uses fewer features.
  • Loss Function: L=L0+λwiL = L_0 + \lambda \sum |w_i|
    • L0L_0: original loss (e.g., mean squared error)
    • wiw_i: coefficients
    • λ\lambda: regularization parameter controlling the strength of the penalty.

2. L2 Regularization (Ridge)

  • Description: Adds the square of the coefficients as a penalty term to the loss function.
  • Effect: Tends to reduce the size of coefficients but does not set any to zero. It shrinks the weights more evenly across all features, making the model more stable.
  • Loss Function: L=L0+λwi2L = L_0 + \lambda \sum w_i^2

3. Elastic Net Regularization

  • Description: Combines both L1 and L2 regularization. It can select features (like L1) while also encouraging smaller weights (like L2).
  • Effect: Useful when there are multiple features correlated with each other.
  • Loss Function: L=L0+λ1wi+λ2wi2L = L_0 + \lambda_1 \sum |w_i| + \lambda_2 \sum w_i^2

4. Dropout

  • Description: A regularization technique specifically used in neural networks where randomly selected neurons are ignored (dropped out) during training.
  • Effect: Prevents co-adaptation of neurons, helping the network to generalize better by forcing it to learn robust features that are useful independently of others.

5. Early Stopping

  • Description: Involves monitoring the model's performance on a validation set during training and stopping the training process when performance starts to degrade (indicating overfitting).
  • Effect: Prevents the model from learning noise in the training data.

6. Data Augmentation

  • Description: Increasing the amount of training data by applying transformations (e.g., rotation, scaling, flipping) to existing data.
  • Effect: Helps the model generalize better by exposing it to various forms of data.

7. Weight Regularization

  • Description: Adding constraints on the weights (e.g., constraining the weights to lie within a certain range).
  • Effect: Helps in controlling model complexity and prevents overfitting.

8. Batch Normalization

  • Description: Normalizes the output of a layer to stabilize learning, effectively acting as a form of regularization.
  • Effect: Reduces internal covariate shift and can lead to faster training.

--ChatGPT