วันพฤหัสบดีที่ 28 เมษายน พ.ศ. 2559

F-measure (F1 score, F score), precision, recall

tp = true positive ระบบบอกว่าใช่ (positive) และในความเป็นจริงก็คือใช่ (true)
fp = false positive ระบบบอกว่าใช่แต่ในความเป็นจริงคือไม่ใช่
fn = false negative ระบบบอกว่าไม่ใช่แต่ในความเป็นจริงคือใช่

precision = tp / (tp + fp) ค่ายิ่งเยอะยิ่งดี คือระบบเข้าใจผิดน้อย (exactness)
recall = tp / (tp + fn) ค่ายิ่งเยอะยิ่งดี คือระบบตกหล่นน้อย (completeness)
F-measure (%) = 100 x 2 x precision x recall /(precision + recall)

Accuracy VS F-measure : http://machinelearningmastery.com/classification-accuracy-is-not-enough-more-performance-measures-you-can-use/

F1-score is typically used for binary classification. To compute F1 for multi-class classification: 
1. Macro-Averaged F1-Score
Definition: Compute the F1-score independently for each class and take the average.
Use Case: Gives equal weight to all classes, regardless of their size.
Formula:




2. Micro-Averaged F1-Score
Definition: Aggregate the contributions of all classes to compute the F1-score using the total true positives, false positives, and false negatives.
Use Case: Gives equal weight to all samples, favoring performance on larger classes.
Formula:




3. Weighted-Averaged F1-Score
Definition: Compute the F1-score for each class, weighted by the number of samples in that class.
Use Case: Balances class imbalance by assigning more weight to larger classes.
Formula:



---
Choosing the Right Approach

Use Macro-Averaging if you want to treat all classes equally, even if some are small.

Use Micro-Averaging if you want to focus on overall performance, regardless of class distribution.

Use Weighted-Averaging if you want a balance that considers both overall performance and class imbalance.

Many libraries, like scikit-learn, provide built-in support for computing these metrics.