In educational assessment, the choice between raw scores, criterion-referenced evaluation, and norm-referenced evaluation depends on the instructor's objectives, available resources, and the need for explainability.
Raw Scores
Raw scores are the direct numerical values measured during an assessment before any interpretation or conversion into grades occurs.
- Pros: They provide a precise, granular measurement of an individual's performance on a specific task without the potential bias introduced by grading algorithms.
- Cons: Raw scores alone are difficult to interpret as they do not inherently inform learners or instructors of the actual level of learning competence or the necessary improvements needed.
Criterion-Referenced Evaluation
This scheme translates performance into absolute rating labels (e.g., Excellent, A) based on a predetermined rubric or fixed standard.
- Pros: It ensures that grades reflect a student's mastery of specific content regardless of how their peers perform. It provides a clear, absolute standard that is often easier for stakeholders to understand.
- Cons: It is most suitable for examinations that cover all content topics, which typically requires significantly longer exam-taking times and more resources for checking answers. It can be difficult to apply if the assessment is not comprehensive.
Norm-Referenced Evaluation
This scheme converts scores into relative ranking labels by comparing an individual’s performance to the performance of their peers.
- Pros: It is highly efficient for large classes or courses where instructors must meet strict time constraints and save on grading resources. It is the preferred "choice of necessity" when exams cannot comprehensively assess all topics due to limited resources. It inherently reflects the relative quality of performance within a specific group.
- Cons: It can be difficult to explain the reasoning behind grade boundaries, leading to disputes between learners and instructors when scores are close but result in different grades. Because it lacks predefined absolute criteria, it is more susceptible to bias and concerns regarding fairness.
Comparison Summary
| Feature | Raw Scores | Criterion-Referenced | Norm-Referenced |
|---|---|---|---|
| Primary Focus | Numerical data | Mastery of content | Relative ranking |
| Standards | None | Absolute/Predefined | Relative/Group-based |
| Best Use Case | Initial data collection | Certifying competency | Large-scale ranking |
| Major Drawback | Lack of context | Resource intensive | Hard to justify boundaries |