Data analytics AI
วันพุธที่ 18 มีนาคม พ.ศ. 2569
วันอาทิตย์ที่ 15 มีนาคม พ.ศ. 2569
Chain of Thought (CoT) or Reasoning
- LLM without CoT is for query tasks, simple classification tasks (like Sentiment analysis and Topic labeling) and translation tasks. This kind of LLM consumes generates fewer intermediate token thus saving cost and computing resources and faster speed.
- LLM with CoT is for Math and logic tasks that rely on reasoning trace before coming up with a final answer.
Example problem:
Roger has 5 tennis balls.
He buys 2 cans of tennis balls.
Each can has 3 tennis balls.
How many tennis balls does Roger have now?
Without Chain-of-Thought (Direct Answer), the model may incorrectly compute:
-
5+2+3=10
วันศุกร์ที่ 13 มีนาคม พ.ศ. 2569
Minimum-cost flow
Minimum-cost flow is an optimization problem that finds the cheapest way to send a specific amount of "flow" (material, data, or goods) from source nodes to sink nodes through a network. It minimizes total transportation costs, ensuring flow does not exceed edge capacities.
- Capacity Constraints: Every edge has a maximum capacity that cannot be exceeded.
- Cost per Unit: Each unit of flow on an edge has a associated cost.
- Flow Conservation: For every node, flow in must equal flow out, except for supply nodes () and demand nodes ().
- Goal: Minimize(total cost) while satisfying all supply/demand needs.
- Logistics & Supply Chain: Shipping goods from factories to consumers at minimum cost.
- Telecommunications: Routing data packets to reduce latency and maximize bandwidth utilization.
- Energy Distribution: Transporting electricity or liquids through pipelines.
- Assignment Problems: Matching tasks to workers efficiently.
The problem is typically solved using variations of the Successive Shortest Path algorithm, which uses Bellman-Ford or Dijkstra's algorithm with potentials to find the cheapest path, often using solvers such as Gurobi or Google OR-Tools.
วันพุธที่ 11 มีนาคม พ.ศ. 2569
Raw scores vs Criterion-referenced evaluation vs Norm-referenced evaluation
In human performance assessment, the choice between raw scores, criterion-referenced evaluation, and norm-referenced evaluation depends on the instructor's objectives, available resources, and the need for explainability.
Raw Scores
Raw scores are the direct numerical values measured during an assessment before any interpretation or conversion into grades occurs.
- Pros: They provide a precise, granular measurement of an individual's performance on a specific task without the potential bias introduced by grading algorithms.
- Cons: Raw scores alone are difficult to interpret as they do not inherently inform learners or instructors of the actual level of learning competence or the necessary improvements needed. For example, a raw score of 50/100 could mean: excellent performance on a very difficult test or poor performance on an easy test
Criterion-Referenced Evaluation
This scheme translates performance into absolute rating labels (e.g., Excellent, A) based on a predetermined rubric or fixed standard.
- Pros: It ensures that grades reflect a student's mastery of specific content regardless of how their peers perform. It provides a clear, absolute standard that is often easier for stakeholders to understand.
- Cons: It is most suitable for examinations that cover all content topics, which typically requires significantly longer exam-taking times and more resources for checking answers. It can be difficult to apply if the assessment is not comprehensive.
Norm-Referenced Evaluation
This scheme converts scores into relative ranking labels by comparing an individual’s performance to the performance of their peers.
- Pros: It is highly efficient for large classes or courses where instructors must meet strict time constraints and save on grading resources. It is the preferred "choice of necessity" when exams cannot comprehensively assess all topics due to limited resources. It inherently reflects the relative quality of performance within a specific group.
- Cons: It can be difficult to explain the reasoning behind grade boundaries, leading to disputes between learners and instructors when scores are close but result in different grades. Because it lacks predefined absolute criteria, it is more susceptible to bias and concerns regarding fairness.
Comparison Summary
| Feature | Raw Scores | Criterion-Referenced grading | Norm-Referenced grading |
|---|---|---|---|
| Primary Focus | Direct measurement without interpretation | Interpretation as Mastery of content | Interpretation as Relative ranking |
| Standards | None | Absolute/Predefined | Relative/Group-based |
| Best Use Case | Raw ranking like TCAS exam | Certifying competency | Large-scale ranking |
| Major Drawback | Lack of context | Resource intensive | Hard to justify boundaries |
The following points explain why these criteria can be problematic and how they contrast with the alternative methods discussed in the sources:
- Fixed Percent Ranges: Criterion-referenced grading typically maps a learning score to a predefined percent range for a specific grade (e.g., 80% for an A). This means the standards are set before the assessment begins and do not change regardless of the actual distribution of student performance.
- Lack of Explainable Discrimination: A core difficulty with fixed boundaries is the "explainability" of the grade. In these systems, a student scoring just below a threshold (like 79 vs. 80) may receive a different grade without a data-driven justification for that specific cut-off. The sources suggest that it is difficult for instructors to resolve disputes when learners score contiguously but fall into different predefined boundaries.
- Arbitrary Nature of Absolute Standards: Because these criteria are absolute, they may not reflect the relative quality of an individual’s performance compared to their peers. If an exam is "overly difficult" or "too easy," all learners with similar scores might get the same grade C which cannot accurately differentiate the true learning competence of the group.
- Contrast with Data-Driven Gaps: To address the problem of arbitrary cut-offs, the sources propose norm-referenced heuristic methods like the Widest-Gap-First algorithm. Instead of using a predefined number like 80, this method identifies the widest score gaps in the actual data to define boundaries. This provides a "simple and clear-cut justification": a student receives a certain grade because their score is closer to others in that group than to the group above.
- Fairness Concerns: When unique grade symbols represent unequal score intervals (such as F covering 0–50 while A covers only 80–100), it can be seen as providing unequal chances for students to receive certain grades. The sources note that "fair" grading should ideally maintain uniform intervals or use widest score gaps to prevent two learners with similar competence from receiving different grades.
Stanine
In educational assessment, a stanine (short for STAndard NINE) is a method of scaling test scores on a nine-point standard scale with a mean of 5 and a standard deviation of 2.
It is designed to simplify the interpretation of test results by grouping scores into broad categories rather than looking at precise raw scores or percentiles.
How Stanines Work
The scale converts a normal distribution of scores into nine units.
| Stanine | Percentage of Cases | Performance Level |
| 9 | 4% | Highest (Top) |
| 8 | 7% | Well Above Average |
| 7 | 12% | Above Average |
| 6 | 17% | High Average |
| 5 | 20% | Average |
| 4 | 17% | Low Average |
| 3 | 12% | Below Average |
| 2 | 7% | Well Below Average |
| 1 | 4% | Lowest (Bottom) |
Key Characteristics
Coarseness: Because it only has nine points, it "smooths out" small, insignificant differences between students. For example, two students with slightly different raw scores might both be a "Stanine 6," preventing over-interpretation of minor score gaps.
Comparison: It allows educators to compare a student’s performance across different subjects (e.g., comparing a Stanine 7 in Math to a Stanine 5 in Reading) using a single, unified metric.
Simplicity: It is often easier for parents and students to understand than complex z-scores or T-scores.
Mathematical Context
If you are working with standard normal distributions, the stanine (S) can be calculated from a z-score (z) using the following linear transformation:
The result is then rounded to the nearest whole number between 1 and 9.
Use in Pedagogy
In the context of Outcome-Based Education (OBE) or curriculum design, stanines are frequently used to identify groups of students who may need additional support or advanced enrichment, as they provide a clear snapshot of where a student sits relative to a peer group.
Would you like me to help you create a grading rubric or a distribution chart based on this scale?
--Gemini