วันจันทร์ที่ 19 พฤษภาคม พ.ศ. 2568

Scikit-learn Machine Learning in Python

 https://scikit-learn.org/stable/

วันพฤหัสบดีที่ 15 พฤษภาคม พ.ศ. 2568

Norm-referenced VS Criterion-based Achievement Grading

Norm-referenced method: When the exam only samples part of the content (due to time constraints, for instance), it may not fully reflect all students' knowledge or abilities. In such cases, norm-referencing helps distinguish performance levels relative to peers, especially if the exam is designed to be difficult or selective.

Criterion-based method: If the exam fully aligns with the course objectives and content, it's appropriate to assess students based on predetermined criteria. In this case, every student theoretically has an equal opportunity to succeed by demonstrating mastery of the material.

===

📌 Why Use Norm-Referenced Grading When an Exam Has Incomplete Content Coverage?

It's not that incomplete content coverage requires norm-referenced grading, but rather that norm-referenced grading can be more practical or justifiable in that situation. Here’s the reasoning:


🔍 1. Incomplete Coverage = Limited Validity for Mastery Judgments

  • In criterion-based grading, you’re judging whether students meet specific learning outcomes.

  • But if the test only covers part of what was taught, you can’t be sure a student has mastered all intended outcomes — the exam doesn’t measure them all.

  • This makes it difficult to fairly say “Student A met the standard” if the test didn’t assess the full standard.

✅ So: If content coverage is partial, a claim like “meets expectations” (criterion-based) is less valid.


🔍 2. Norm-Referencing Focuses on Ranking, Not Mastery

  • Norm-referenced grading doesn't claim to assess full mastery — it just compares students to each other.

  • If everyone is tested on the same (even partial) content, you can still rank performance fairly.

  • This is especially common in competitive settings like entrance exams, where the goal is to identify the top X%.

✅ So: Even if the content is partial, norm-referenced grading can still say, “Student A performed better than 85% of peers.”


🔍 3. Selective Testing is Often Meant to Differentiate, Not to Measure Everything

  • Some exams (especially in large-scale or competitive environments) are designed to be selective, challenging, and not to reflect all course content.

  • In these cases, norm-referencing is deliberate — the test's role is to discriminate between levels of performance, not verify full learning.


📘 Example to Illustrate

Imagine a computer science course with 10 learning objectives:

  • Scenario A (Criterion-Based Fit): The final exam has one question for each objective, with rubrics. You can say “Students mastered 8 out of 10 objectives.”

  • Scenario B (Norm-Referenced Fit): The exam covers only 4 objectives in depth (due to time constraints), with high difficulty and trick questions. You can’t judge overall mastery, but you can still say “Student A is in the top 10%.”

--ChatGPT

วันพุธที่ 14 พฤษภาคม พ.ศ. 2568

WCSS VS DBI

The Within-Cluster Sum of Squares (WCSS) is a measure used to evaluate the compactness of clusters in a clustering algorithm, such as k-means. It calculates the sum of squared distances between each data point and the centroid (mean) of the cluster it belongs to.

Mathematically, the WCSS for a set of clusters is expressed as:



WCSS (Within-Cluster Sum of Squares) and DBI (Davies–Bouldin Index) are both metrics used to evaluate clustering performance, but they focus on different aspects:

---

Objective:

WCSS Measures compactness of clusters (intra-cluster similarity)

DBI Balances compactness and separation between clusters (inter-cluster dissimilarity)


Key Differences:

WCSS looks only within clusters.

DBI looks both within and between clusters.

Data anonymization

  • คือการนำข้อมูลส่วนบุคคลไปประมวลผลให้เกิดคุณค่าทางธุรกิจต่อไปได้โดยไม่เปิดเผยการระบุตัวตน
  • Pseudonymization vs Anonymization

Pseudonymization

    • Definition: Personal data is replaced with pseudonyms (e.g., codes, numbers) but can still be re-identified using additional information (e.g., a key).

    • Reversibility: Reversible – the original data can be restored if the pseudonym and key are combined.

    • Purpose: Reduces risks during data processing, storage, or sharing, while still allowing for re-identification when necessary (e.g., in medical research).

    • Example: Replacing patient names with IDs in a health database, while keeping a separate file that links IDs to names.

    • GDPR Status: Still considered personal data, but offers some compliance benefits if implemented correctly.

    Anonymization

    • Definition: Personal data is irreversibly altered so that the individual can no longer be identified, directly or indirectly.

    • Reversibility: Irreversible – the data cannot be traced back to a person.

    • Purpose: Used when there's no need to identify individuals, such as for open data publication or aggregate analysis.

    • Example: Aggregating survey results so that individual responses cannot be linked to specific participants.

    • GDPR Status: Not considered personal data – once data is truly anonymized, GDPR no longer applies.

วันอังคารที่ 13 พฤษภาคม พ.ศ. 2568

Thailand's Giga Data Center Initiative

Thailand has announced plans to invest approximately 170 billion baht (around USD 4.7 billion) to establish itself as a Giga Data Hub in the ASEAN region. This initiative is a collaboration between the Thai government, the Charoen Pokphand Group (CP Group), and global investment firm Global Infrastructure Partners (GIP). The goal is to develop advanced data center infrastructure, positioning Thailand as a central hub for data services in Southeast Asia.  

วันอังคารที่ 29 เมษายน พ.ศ. 2568

Research discussion

  • Comparison with prior work: Discuss how your results align or differ from previous studies. 
  • Explanation of unexpected outcomes: Offer possible reasons for any surprising or contradictory findings. Discuss any unexpected findings and possible reasons.

วันจันทร์ที่ 28 เมษายน พ.ศ. 2568

Types of databases

  1. Relation database
  2. ฐานข้อมูลแบบกระจาย
  3. ฐานข้อมูลกราฟ
  4. ฐานข้อมูลเชิงวัตถุ
  5. ฐานข้อมูลอธิบายตนเอง
  6. ฐานข้อมูลโนเอสคิวแอล
  7. ฐานข้อมูลเวกเตอร์
  8. Spatial database
  9. Blockchain
  10. In-memory database
  11. Time-series database



Log base

log n  Depends on context (often base 10 in mathematics or base 2 in computer science)

ln n    Natural log (base e)

lg n    Base 2

Time complexity list

 https://en.wikipedia.org/wiki/Time_complexity






















y is a constant.

วันพฤหัสบดีที่ 24 เมษายน พ.ศ. 2568

Zipf's law

 Zipf’s Law says that the frequency of an item is inversely proportional to its rank in a frequency table.

Mathematically:

f(r)1rsf(r) \propto \frac{1}{r^s}
  • f(r) = frequency of the item ranked r

  • s = Zipf exponent (a skewness factor, typically between 0.5 and 2)

  • Higher s = more skewed distribution

Proven in Multiple Domains:

Zipf-like distributions have been empirically observed in:

  • Web access logs

  • Video-on-demand services

  • CDN (Content Delivery Network) traffic

  • Edge computing systems

  • Cloud file sharing and storage platforms

Useful in Simulation and Modeling:

  • Researchers and system designers use Zipf to simulate realistic user behaviors when testing caching algorithms, load balancing mechanisms, or data placement strategies.

 When Zipf Might Not Apply:

  • In systems where access is uniformly random (e.g., randomized testing, early-stage services), Zipf might not be appropriate.

  • If content popularity changes rapidly, additional models (e.g., dynamic Zipf, time-decay models, or Markov chains) may be more realistic.\

--ChatGPT

วันอาทิตย์ที่ 20 เมษายน พ.ศ. 2568

Air quotes

Also known as finger quotes or thumb quotes, are a nonverbal gesture where someone uses their fingers to mimic quotation marks in the air while speaking. This gesture is used to show a word or phrase is being used skeptically (having doubts) or with a particular emphasis. 

วันศุกร์ที่ 18 เมษายน พ.ศ. 2568

วันอังคารที่ 8 เมษายน พ.ศ. 2568

วันพุธที่ 19 มีนาคม พ.ศ. 2568

ภาพลักษณ์ที่ดูดีของอาจารย์รุ่นใหญ่

ควรเป็นักทฤษฎี ไม่ใช่นักปฏิบัติ (technologiest/practioners) เพราะความแก่ย้อนแย้งกับความทันสมัยในเทคโนโลยี ทำให้ดูไม่น่าเชื่อถือ

วันอังคารที่ 18 มีนาคม พ.ศ. 2568

Cyber security is super set of information security

Cyber security includes not only information as in information security but also physical and human safety, covering cyber-physical systems (physical objects interacting with computers using iot), operational technology (OT), and emerging threats like AI-driven attacks.

Port isolation for wifi

https://www.cyfence.com/article/protect-your-network-for-newbie-admin/

open-source LLM app development platform

Dify

https://dify.ai/

A key aspect of Dify is its focus on providing a platform for integrating various LLMs, rather than solely relying on a single, built-in pre-trained model.

Also Dify uses knowledge bases to augment the LLM's answers. This allows for the LLM to provide answers based on up to date information, and information that is specific to the user.

In summary, Dify facilitates the use of pre-trained models from other sources and also allows for the integration of custom mode.

Langchain

https://python.langchain.com/docs/introduction/

https://www.infoworld.com/article/2338940/a-brief-guide-to-langchain-for-software-developers.html

Comparison

  • LangChain: Requires programming skills.
  • Dify: User-friendly, low-code interface.
  • LangChain: A library for building LLM applications.
  • Dify: A platform for building and deploying LLM applications.
  • Both are SDK for writing apps to interact with the LLM models of LLM providers (https://python.langchain.com/docs/integrations/llms/)

  • LlamaIndex
    https://www.llamaindex.ai/
    • LangChain focuses on orchestrating workflows by connecting LLMs with APIs, tools, and data sources.
    • An open-source data framework designed to connect Large Language Models (LLMs) with private, external, or up-to-date data that wasn't included in the LLM's original training.
    • LlamaIndex specializes in integrating external data sources (i.e., RAG) and enabling efficient querying, making it ideal for search and retrieval tasks.
    • Llama Stack provides an all-in-one solution for working with Meta’s Llama models, with an emphasis on streamlined deployment and scalability.

    Other frees
  • Gemma, Sea Lion, Typhoon, THaLLE