https://scikit-learn.org/stable/
วันจันทร์ที่ 19 พฤษภาคม พ.ศ. 2568
วันพฤหัสบดีที่ 15 พฤษภาคม พ.ศ. 2568
Norm-referenced VS Criterion-based Achievement Grading
Norm-referenced method: When the exam only samples part of the content (due to time constraints, for instance), it may not fully reflect all students' knowledge or abilities. In such cases, norm-referencing helps distinguish performance levels relative to peers, especially if the exam is designed to be difficult or selective.
Criterion-based method: If the exam fully aligns with the course objectives and content, it's appropriate to assess students based on predetermined criteria. In this case, every student theoretically has an equal opportunity to succeed by demonstrating mastery of the material.
===
📌 Why Use Norm-Referenced Grading When an Exam Has Incomplete Content Coverage?
It's not that incomplete content coverage requires norm-referenced grading, but rather that norm-referenced grading can be more practical or justifiable in that situation. Here’s the reasoning:
🔍 1. Incomplete Coverage = Limited Validity for Mastery Judgments
-
In criterion-based grading, you’re judging whether students meet specific learning outcomes.
-
But if the test only covers part of what was taught, you can’t be sure a student has mastered all intended outcomes — the exam doesn’t measure them all.
-
This makes it difficult to fairly say “Student A met the standard” if the test didn’t assess the full standard.
✅ So: If content coverage is partial, a claim like “meets expectations” (criterion-based) is less valid.
🔍 2. Norm-Referencing Focuses on Ranking, Not Mastery
-
Norm-referenced grading doesn't claim to assess full mastery — it just compares students to each other.
-
If everyone is tested on the same (even partial) content, you can still rank performance fairly.
-
This is especially common in competitive settings like entrance exams, where the goal is to identify the top X%.
✅ So: Even if the content is partial, norm-referenced grading can still say, “Student A performed better than 85% of peers.”
🔍 3. Selective Testing is Often Meant to Differentiate, Not to Measure Everything
-
Some exams (especially in large-scale or competitive environments) are designed to be selective, challenging, and not to reflect all course content.
-
In these cases, norm-referencing is deliberate — the test's role is to discriminate between levels of performance, not verify full learning.
📘 Example to Illustrate
Imagine a computer science course with 10 learning objectives:
-
Scenario A (Criterion-Based Fit): The final exam has one question for each objective, with rubrics. You can say “Students mastered 8 out of 10 objectives.”
-
Scenario B (Norm-Referenced Fit): The exam covers only 4 objectives in depth (due to time constraints), with high difficulty and trick questions. You can’t judge overall mastery, but you can still say “Student A is in the top 10%.”
วันพุธที่ 14 พฤษภาคม พ.ศ. 2568
WCSS VS DBI
The Within-Cluster Sum of Squares (WCSS) is a measure used to evaluate the compactness of clusters in a clustering algorithm, such as k-means. It calculates the sum of squared distances between each data point and the centroid (mean) of the cluster it belongs to.
Mathematically, the WCSS for a set of clusters is expressed as:
WCSS (Within-Cluster Sum of Squares) and DBI (Davies–Bouldin Index) are both metrics used to evaluate clustering performance, but they focus on different aspects:
---
Objective:
WCSS Measures compactness of clusters (intra-cluster similarity)
DBI Balances compactness and separation between clusters (inter-cluster dissimilarity)
Key Differences:
WCSS looks only within clusters.
DBI looks both within and between clusters.
Data anonymization
- คือการนำข้อมูลส่วนบุคคลไปประมวลผลให้เกิดคุณค่าทางธุรกิจต่อไปได้โดยไม่เปิดเผยการระบุตัวตน
- Pseudonymization vs Anonymization
Pseudonymization
-
-
Definition: Personal data is replaced with pseudonyms (e.g., codes, numbers) but can still be re-identified using additional information (e.g., a key).
-
Reversibility: Reversible – the original data can be restored if the pseudonym and key are combined.
-
Purpose: Reduces risks during data processing, storage, or sharing, while still allowing for re-identification when necessary (e.g., in medical research).
-
Example: Replacing patient names with IDs in a health database, while keeping a separate file that links IDs to names.
-
GDPR Status: Still considered personal data, but offers some compliance benefits if implemented correctly.
Anonymization
-
Definition: Personal data is irreversibly altered so that the individual can no longer be identified, directly or indirectly.
-
Reversibility: Irreversible – the data cannot be traced back to a person.
-
Purpose: Used when there's no need to identify individuals, such as for open data publication or aggregate analysis.
-
Example: Aggregating survey results so that individual responses cannot be linked to specific participants.
-
GDPR Status: Not considered personal data – once data is truly anonymized, GDPR no longer applies.
-
วันอังคารที่ 13 พฤษภาคม พ.ศ. 2568
Thailand's Giga Data Center Initiative
Thailand has announced plans to invest approximately 170 billion baht (around USD 4.7 billion) to establish itself as a Giga Data Hub in the ASEAN region. This initiative is a collaboration between the Thai government, the Charoen Pokphand Group (CP Group), and global investment firm Global Infrastructure Partners (GIP). The goal is to develop advanced data center infrastructure, positioning Thailand as a central hub for data services in Southeast Asia.
วันอังคารที่ 29 เมษายน พ.ศ. 2568
Research discussion
- Comparison with prior work: Discuss how your results align or differ from previous studies.
- Explanation of unexpected outcomes: Offer possible reasons for any surprising or contradictory findings. Discuss any unexpected findings and possible reasons.
วันจันทร์ที่ 28 เมษายน พ.ศ. 2568
Types of databases
- Relation database
- ฐานข้อมูลแบบกระจาย
- ฐานข้อมูลกราฟ
- ฐานข้อมูลเชิงวัตถุ
- ฐานข้อมูลอธิบายตนเอง
- ฐานข้อมูลโนเอสคิวแอล
- ฐานข้อมูลเวกเตอร์
- Spatial database
- Blockchain
- In-memory database
- Time-series database
Log base
log n Depends on context (often base 10 in mathematics or base 2 in computer science)
ln n Natural log (base e)
lg n Base 2
วันพฤหัสบดีที่ 24 เมษายน พ.ศ. 2568
Zipf's law
Zipf’s Law says that the frequency of an item is inversely proportional to its rank in a frequency table.
Mathematically:
-
f(r) = frequency of the item ranked r
-
s = Zipf exponent (a skewness factor, typically between 0.5 and 2)
-
Higher s = more skewed distribution
Zipf-like distributions have been empirically observed in:
Web access logs
Video-on-demand services
CDN (Content Delivery Network) traffic
Edge computing systems
- Cloud file sharing and storage platforms
Useful in Simulation and Modeling:
Researchers and system designers use Zipf to simulate realistic user behaviors when testing caching algorithms, load balancing mechanisms, or data placement strategies.
When Zipf Might Not Apply:
-
In systems where access is uniformly random (e.g., randomized testing, early-stage services), Zipf might not be appropriate.
-
If content popularity changes rapidly, additional models (e.g., dynamic Zipf, time-decay models, or Markov chains) may be more realistic.\
วันอาทิตย์ที่ 20 เมษายน พ.ศ. 2568
Air quotes
Also known as finger quotes or thumb quotes, are a nonverbal gesture where someone uses their fingers to mimic quotation marks in the air while speaking. This gesture is used to show a word or phrase is being used skeptically (having doubts) or with a particular emphasis.
วันศุกร์ที่ 18 เมษายน พ.ศ. 2568
วันอังคารที่ 8 เมษายน พ.ศ. 2568
วันพุธที่ 19 มีนาคม พ.ศ. 2568
ภาพลักษณ์ที่ดูดีของอาจารย์รุ่นใหญ่
ควรเป็นักทฤษฎี ไม่ใช่นักปฏิบัติ (technologiest/practioners) เพราะความแก่ย้อนแย้งกับความทันสมัยในเทคโนโลยี ทำให้ดูไม่น่าเชื่อถือ
วันอังคารที่ 18 มีนาคม พ.ศ. 2568
Cyber security is super set of information security
Cyber security includes not only information as in information security but also physical and human safety, covering cyber-physical systems (physical objects interacting with computers using iot), operational technology (OT), and emerging threats like AI-driven attacks.
open-source LLM app development platform
Dify
https://dify.ai/
A key aspect of Dify is its focus on providing a platform for integrating various LLMs, rather than solely relying on a single, built-in pre-trained model.
Also Dify uses knowledge bases to augment the LLM's answers. This allows for the LLM to provide answers based on up to date information, and information that is specific to the user.
In summary, Dify facilitates the use of pre-trained models from other sources and also allows for the integration of custom mode.
Langchain
https://python.langchain.com/docs/introduction/
https://www.infoworld.com/article/2338940/a-brief-guide-to-langchain-for-software-developers.html
Comparison
- LangChain focuses on orchestrating workflows by connecting LLMs with APIs, tools, and data sources.
- An open-source data framework designed to connect Large Language Models (LLMs) with private, external, or up-to-date data that wasn't included in the LLM's original training.
- LlamaIndex specializes in integrating external data sources (i.e., RAG) and enabling efficient querying, making it ideal for search and retrieval tasks.
- Llama Stack provides an all-in-one solution for working with Meta’s Llama models, with an emphasis on streamlined deployment and scalability.