วันพุธที่ 11 มีนาคม พ.ศ. 2569

Raw scores vs Criterion-referenced evaluation vs Norm-referenced evaluation

In educational assessment, the choice between raw scores, criterion-referenced evaluation, and norm-referenced evaluation depends on the instructor's objectives, available resources, and the need for explainability.

Raw Scores

Raw scores are the direct numerical values measured during an assessment before any interpretation or conversion into grades occurs.

  • Pros: They provide a precise, granular measurement of an individual's performance on a specific task without the potential bias introduced by grading algorithms.
  • Cons: Raw scores alone are difficult to interpret as they do not inherently inform learners or instructors of the actual level of learning competence or the necessary improvements needed.

Criterion-Referenced Evaluation

This scheme translates performance into absolute rating labels (e.g., Excellent, A) based on a predetermined rubric or fixed standard.

  • Pros: It ensures that grades reflect a student's mastery of specific content regardless of how their peers perform. It provides a clear, absolute standard that is often easier for stakeholders to understand.
  • Cons: It is most suitable for examinations that cover all content topics, which typically requires significantly longer exam-taking times and more resources for checking answers. It can be difficult to apply if the assessment is not comprehensive.

Norm-Referenced Evaluation

This scheme converts scores into relative ranking labels by comparing an individual’s performance to the performance of their peers.

  • Pros: It is highly efficient for large classes or courses where instructors must meet strict time constraints and save on grading resources. It is the preferred "choice of necessity" when exams cannot comprehensively assess all topics due to limited resources. It inherently reflects the relative quality of performance within a specific group.
  • Cons: It can be difficult to explain the reasoning behind grade boundaries, leading to disputes between learners and instructors when scores are close but result in different grades. Because it lacks predefined absolute criteria, it is more susceptible to bias and concerns regarding fairness.

Comparison Summary

FeatureRaw ScoresCriterion-ReferencedNorm-Referenced
Primary FocusNumerical dataMastery of contentRelative ranking
StandardsNoneAbsolute/PredefinedRelative/Group-based
Best Use CaseInitial data collectionCertifying competencyLarge-scale ranking
Major DrawbackLack of contextResource intensiveHard to justify boundaries

Stanine

 In educational assessment, a stanine (short for STAndard NINE) is a method of scaling test scores on a nine-point standard scale with a mean of 5 and a standard deviation of 2.

It is designed to simplify the interpretation of test results by grouping scores into broad categories rather than looking at precise raw scores or percentiles.

How Stanines Work

The scale converts a normal distribution of scores into nine units. Because it follows a bell curve, most students fall into the middle stanines (4, 5, and 6), while very few fall into the extreme ends (1 or 9).

StaninePercentage of CasesPerformance Level
94%Highest (Top)
87%Well Above Average
712%Above Average
617%High Average
520%Average
417%Low Average
312%Below Average
27%Well Below Average
14%Lowest (Bottom)

Key Characteristics

  • Coarseness: Because it only has nine points, it "smooths out" small, insignificant differences between students. For example, two students with slightly different raw scores might both be a "Stanine 6," preventing over-interpretation of minor score gaps.

  • Comparison: It allows educators to compare a student’s performance across different subjects (e.g., comparing a Stanine 7 in Math to a Stanine 5 in Reading) using a single, unified metric.

  • Simplicity: It is often easier for parents and students to understand than complex z-scores or T-scores.

Mathematical Context

If you are working with standard normal distributions, the stanine (S) can be calculated from a z-score (z) using the following linear transformation:

S = 2z + 5

The result is then rounded to the nearest whole number between 1 and 9.

Use in Pedagogy

In the context of Outcome-Based Education (OBE) or curriculum design, stanines are frequently used to identify groups of students who may need additional support or advanced enrichment, as they provide a clear snapshot of where a student sits relative to a peer group.

Would you like me to help you create a grading rubric or a distribution chart based on this scale?

--Gemini