Dr.Jiw: Feature selection techniques

วันศุกร์ที่ 9 พฤศจิกายน พ.ศ. 2561

Feature selection techniques

Comparison test

t-test ใช้วัดความแตกต่างของค่าเฉลี่ยของสองกลุ่มว่ามีนัยสำคัญหรือไม่ www.sthda.com/english/wiki/t-test-formula ตย ของ Paired t-test https://www.jmp.com/en_nl/statistics-knowledge-portal/t-test/paired-t-test.html
ANOVA ใช้วัดความแตกต่างของค่าเฉลี่ยของ 3 กลุ่มขึ้นไปว่ามีนัยสำคัญหรือไม่ http://www.sthda.com/english/wiki/wiki.php?title=one-way-anova-test-in-r#what-is-one-way-anova-test และยังใช้ทำ feature selection ด้วย https://towardsdatascience.com/anova-for-feature-selection-in-machine-learning-d9305e228476
Friedman test ใช้เมื่อไม่มีสมมติฐานว่า data are normally distributed within each group ไม่เหมือนกับกรณี ANOVA นั่นคือ Fredman test is suitable for non-normal data.
Nemenyi test is used in conjunction with non-parametric statistical tests, such as the Friedman test, to determine which specific groups (e.g., treatments or methods) are significantly different from each other. (A non-parametric statistical test is a type of statistical test that does not require the data to follow specific assumptions about their underlying distribution, such as normality or homogeneity of variance. These tests are often used when parametric test assumptions are violated, the data are ordinal or ranked, or the sample size is small.)

Correlation test

Pearson’s product moment correlation that figure out Pearson’s product moment coefficient
Chi square (X^2) ใช้หา correlation

Using both traditional feature selection like t-tetst and XAI approaches provides a more robust framework:
- T-test/Filter methods can quickly filter out features that have zero statistical link to the target, simplifying the initial dataset before model training. The t-test measures the statistical relationship between a single feature and the target variable, independent of any machine learning model.
- XAI methods are then used after training to fine-tune the feature set by identifying features that, while perhaps statistically significant in isolation, are redundant or not utilized by the complex ML model. XAI methods (like SHAP or Permutation Importance) measure a feature's contribution to a specific model's performance and prediction, thereby considering complex feature interactions.
Studies often show that XAI methods can consistently recover important features that are missed by simple statistics and vice versa, underscoring the value of implementing both.

วันศุกร์ที่ 9 พฤศจิกายน พ.ศ. 2561

Feature selection techniques

ค้นหาบล็อกนี้

คลังบทความของบล็อก