วันพฤหัสบดีที่ 22 กุมภาพันธ์ พ.ศ. 2567

Ensemble learning

 Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Example in CNN : https://towardsdatascience.com/ensembling-convnets-using-keras-237d429157eb

Ensemble architecture:
Decision functions:

The key components of ensemble learning include:

  1. Base Learners (Base Models): These are the individual models that comprise the ensemble. They can be of any type, such as decision trees, neural networks, support vector machines, or any other machine learning algorithm.

  2. Ensemble Methods: These are the techniques used to combine the predictions of the base learners. Some common ensemble methods include:

    • Voting: Combining predictions by majority voting (for classification) or averaging (for regression).
    • Bagging (Bootstrap Aggregating): Training multiple base learners on different subsets of the training data, usually sampled with replacement, and then combining their predictions.
    • Boosting: Building a sequence of base learners where each subsequent learner focuses on the examples that previous learners found difficult, giving higher weight to misclassified instances.
    • Stacking: Training a meta-model (or blender) on the predictions of multiple base learners to make the final prediction.
  3. Diversity: Ensuring that the base learners are diverse, meaning they make different types of errors on the data. This diversity is crucial for the ensemble to outperform individual models. It can be achieved through using different algorithms, different subsets of the data, or different hyperparameters.

  4. Aggregation Strategy: This determines how the predictions of the base learners are combined to produce the final output. Common aggregation strategies include averaging, weighted averaging, or selecting the most frequent prediction.

    Majority Voting: For classification tasks, each base learner's prediction is considered as a "vote," and the final prediction is determined by the majority of votes. This is particularly effective when the base learners have similar performance.
    Weighted Voting: Each base learner's prediction is weighted based on its confidence or performance, and the final prediction is a weighted sum or average of these predictions.

    Averaging:

Simple Average: The predictions of all base learners are averaged to produce the final prediction. This is commonly used in regression tasks.
    Weighted Average: Similar to weighted voting, but the weights are assigned based on the performance or confidence of each base learner.

    Stacking (Meta-Learning):

Base learners' predictions are used as features to train a higher-level model (meta-model or blender). The meta-model learns how to best combine the predictions of base learners to make the final prediction. This approach can capture more complex relationships between the base learners' predictions.

    Bagging (Bootstrap Aggregating):

Base learners are trained on different subsets of the training data, typically sampled with replacement. The final prediction is often the average (for regression) or majority vote (for classification) of the predictions of all base learners. Random Forest is a popular example of a bagging ensemble method using decision trees as base learners.

    Boosting:

Base learners are trained sequentially, with each subsequent learner focusing on the examples that previous learners found difficult. The final prediction is a weighted sum of the predictions of all base learners. Gradient Boosting Machines (GBMs), AdaBoost, and XGBoost are examples of boosting algorithms.

    Rank Aggregation:

In tasks such as recommender systems or search engines, where the goal is to rank items, rank aggregation methods are used to combine the rankings produced by different algorithms into a single ranking that best represents the preferences of the users.
    Evaluation Metric: The metric used to evaluate the performance of the ensemble. Depending on the task (classification, regression, etc.), different evaluation metrics such as accuracy, precision, recall, F1-score, mean squared error (MSE), etc., can be used.
  1. Hyperparameters: Ensemble methods often have hyperparameters that need to be tuned for optimal performance. These may include the number of base learners, learning rates (for boosting algorithms), maximum tree depth (for decision tree-based methods), etc.