วันอาทิตย์ที่ 31 มกราคม พ.ศ. 2564

How AUC-ROC measures classification performance

ROC คือเส้นกราฟ ยิ่งเป็นมุมฉากชิดแกน Y เหนือแกน X (FPR (= FP/(TN+FP)) ยิ่งดี ส่วนแกน Y เป็นค่า recall i.e. TPR (= TP/(TN+FP)

AUC คือพื้นที่ใต้กราฟROC ยิ่งมากแสดงว่าตัวแบบยิ่งจำแนกเก่ง เช่น 1.0 คือจำแนกไม่ผิดเลย ถ้า 0.7 คือจำแนกถูก 70% 

 https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

https://youtu.be/4jRBRDbJemM

You may think of F1 as a measure of precision and recall at a particular threshold value. When you have a data imbalance between positive and negative samples, you should always use F1-score because ROC averages over all possible thresholds!

https://stackoverflow.com/questions/44172162/f1-score-vs-roc-auc

Let’s say you have a dataset with two classes where the first class is 99% of your data, and the second is 1%. Your classifier predicts every observation as falling into the first class. Your TPR (แกนY ของROC) will be very high since it predicts that class well and that represents most of your data. Thus your ROC curve will have a high AUC.

However, your classifier hasn’t actually done a good job predicting the other class and the recall for that class will be 0. With imbalanced classes, it’s easy to get a high AUC without actually making useful predictions, so looking at precision/recall helps you analyze how well you’re predicting each class.

You have to use these metrics together, and it might also be useful to bring in the f1 score, which combines precision and recall for both of the classes.