PowerBI
Tableau
Looker studio (Google): https://lookerstudio.google.com/u/0/navigation/reporting
More at https://dataforest.ai/blog/best-business-intelligence-tools
PowerBI
Tableau
Looker studio (Google): https://lookerstudio.google.com/u/0/navigation/reporting
More at https://dataforest.ai/blog/best-business-intelligence-tools
Examples of K-parametric clustering algorithms:
1. Image and Signal Processing
2. Anomaly Detection
3. Finance and Economics
4. Healthcare and Medicine
5. Industrial and IoT Applications
6. Education and Testing
7. Natural Language Processing (NLP)
8. Retail and Marketing
Examples of Nonparametric clustering algorithms:
Example:
Then the probability of picking x_4 as the next centroid is: P(x_4) = \frac{16}{30.25} \approx 0.529
The Ph.D. (Doctor of Philosophy) and the D.Eng. (Doctor of Engineering) are both doctoral-level degrees, but they differ primarily in focus, purpose, and career trajectory:
1. Focus and Purpose
Ph.D.:
Focuses on original theoretical research.
Aims to contribute new knowledge to a field.
Often prepares candidates for academic careers (professorships, research institutes).
D.Eng. (or Eng.D.):
Emphasizes applied research and practical problem-solving in engineering contexts.
Designed to be more relevant to industry than academia.
Often involves collaboration with companies or government agencies.
2. Dissertation Style
Ph.D.:
Usually results in a highly theoretical dissertation.
Often includes formal models, proofs, or simulations with theoretical insights.
D.Eng.:
Typically results in a practical engineering project or case study, with real-world implementation.
May include prototype development, system design, or applied innovations.
การทดสอบจะให้คนกับคอมฯ ทางซ้ายในรูป มาคุยกับผู้เข้าทดสอบทางขวาในรูป
การทดสอบจะผ่านก็ต่อเมื่อผู้เข้าทดสอบแยกไม่ออกว่ากำลังคุยกับคนหรือคอมพิวเตอร์
Refers to coding with AI assistance. Tools include
When training an ml model, we can do data cleansing. But during real deployment, how to cope with the noise data? Key strategies to cope with noisy data during deployment:
Build a real-time data preprocessing pipeline similar to the one used during training. This may include:
Normalization/standardization
Missing value imputation
Outlier filtering
Text/token cleanup (e.g., lowercasing, removing symbols)
Use the same logic and codebase (or serialize transformers like sklearn
's scalers, spaCy
pipelines, or TensorFlow
preprocessing layers).
Train your model to be robust against noise:
Add noise (augmentation) during training to simulate real-world scenarios (e.g., drop words, add typos, jitter numeric features).
Use regularization techniques (e.g., dropout, L2) to prevent overfitting on overly clean data.
Use prediction confidence scores (from softmax or probability outputs) to:
Reject uncertain predictions
Flag them for human review or fallback systems
Before feeding data to the model, validate input:
Reject ill-formed entries (e.g., empty strings, NaNs)
Ensure values fall within expected ranges or categories
Log or alert on anomalies
Use ensemble models (e.g., majority vote, stacking) that tend to be more stable with noisy input.
Add rule-based systems as a fallback for edge cases (e.g., if input is incomplete or invalid).
Monitor input quality and model performance in production
Detect concept drift or change in noise patterns
Use logs to retrain/update the model periodically
In high-noise environments, deploy a denoising model or filter before the main model:
For images: autoencoders, image filters
For text: typo correction, spell checking
For time series: smoothing, Kalman filters
ใช้การให้คำแนะนำตามระเบียบวิธีวิจัยแบบ metaheuristics (แทนแบบ heuristic ที่เป็น field specific)
In case there are too many prior algorithms, what should you do?
Option 1: If you are lucky in that all of them use the same benchmark dataset then you can compare your proporsed algorithm by using such a benchmark.
Option 2: Select a representative algorithms as baselines. The baselines should include well-known, state-of-the-art, and top-performer. Importantly, the selected baselines should cover methodologically different styles or strategies. And conduct extensive experiment with various evaluation metrics.
Option 3: Determine the formally global optimum and compare your approach against it. If your approach reach the optimum, no need to compare with local optimum algorithms at all. This is a very strong and elegant strategy when applicable. Use the following metrics:
1.Gap to optimum
2.Time to reach optimum
3.Stability over multiple runs (for stochastic algorithms)
If your algorithm consistently reaches or nearly reaches the global optimum:
That’s clear evidence that local-optimal algorithms (like greedy, GA, PSO, etc.) are unnecessary for comparison. You can claim your algorithm is globally optimal or near-optimal in practice. You can skip comparing with heuristic/metaheuristic baselines if:
Your algorithm reaches the global optimum in all test cases, or
It comes within a very tight tolerance (say, ≤1%) and is significantly faster.
This not only saves space and time in your paper, but also strengthens your scientific rigor, since you base your results on a provable benchmark.
Norm-referenced method: When the exam only samples part of the content (due to time constraints, for instance), it may not fully reflect all students' knowledge or abilities. In such cases, norm-referencing helps distinguish performance levels relative to peers, especially if the exam is designed to be difficult or selective.
Criterion-based method: If the exam fully aligns with the course objectives and content, it's appropriate to assess students based on predetermined criteria. In this case, every student theoretically has an equal opportunity to succeed by demonstrating mastery of the material.
===
It's not that incomplete content coverage requires norm-referenced grading, but rather that norm-referenced grading can be more practical or justifiable in that situation. Here’s the reasoning:
In criterion-based grading, you’re judging whether students meet specific learning outcomes.
But if the test only covers part of what was taught, you can’t be sure a student has mastered all intended outcomes — the exam doesn’t measure them all.
This makes it difficult to fairly say “Student A met the standard” if the test didn’t assess the full standard.
✅ So: If content coverage is partial, a claim like “meets expectations” (criterion-based) is less valid.
Norm-referenced grading doesn't claim to assess full mastery — it just compares students to each other.
If everyone is tested on the same (even partial) content, you can still rank performance fairly.
This is especially common in competitive settings like entrance exams, where the goal is to identify the top X%.
✅ So: Even if the content is partial, norm-referenced grading can still say, “Student A performed better than 85% of peers.”
Some exams (especially in large-scale or competitive environments) are designed to be selective, challenging, and not to reflect all course content.
In these cases, norm-referencing is deliberate — the test's role is to discriminate between levels of performance, not verify full learning.
Imagine a computer science course with 10 learning objectives:
Scenario A (Criterion-Based Fit): The final exam has one question for each objective, with rubrics. You can say “Students mastered 8 out of 10 objectives.”
Scenario B (Norm-Referenced Fit): The exam covers only 4 objectives in depth (due to time constraints), with high difficulty and trick questions. You can’t judge overall mastery, but you can still say “Student A is in the top 10%.”
The Within-Cluster Sum of Squares (WCSS) is a measure used to evaluate the compactness of clusters in a clustering algorithm, such as k-means. It calculates the sum of squared distances between each data point and the centroid (mean) of the cluster it belongs to.
Mathematically, the WCSS for a set of clusters is expressed as:
WCSS (Within-Cluster Sum of Squares) and DBI (Davies–Bouldin Index) are both metrics used to evaluate clustering performance, but they focus on different aspects:
---
Objective:
WCSS Measures compactness of clusters (intra-cluster similarity)
DBI Balances compactness and separation between clusters (inter-cluster dissimilarity)
Key Differences:
WCSS looks only within clusters.
DBI looks both within and between clusters.
Definition: Personal data is replaced with pseudonyms (e.g., codes, numbers) but can still be re-identified using additional information (e.g., a key).
Reversibility: Reversible – the original data can be restored if the pseudonym and key are combined.
Purpose: Reduces risks during data processing, storage, or sharing, while still allowing for re-identification when necessary (e.g., in medical research).
Example: Replacing patient names with IDs in a health database, while keeping a separate file that links IDs to names.
GDPR Status: Still considered personal data, but offers some compliance benefits if implemented correctly.
Definition: Personal data is irreversibly altered so that the individual can no longer be identified, directly or indirectly.
Reversibility: Irreversible – the data cannot be traced back to a person.
Purpose: Used when there's no need to identify individuals, such as for open data publication or aggregate analysis.
Example: Aggregating survey results so that individual responses cannot be linked to specific participants.
GDPR Status: Not considered personal data – once data is truly anonymized, GDPR no longer applies.
Thailand has announced plans to invest approximately 170 billion baht (around USD 4.7 billion) to establish itself as a Giga Data Hub in the ASEAN region. This initiative is a collaboration between the Thai government, the Charoen Pokphand Group (CP Group), and global investment firm Global Infrastructure Partners (GIP). The goal is to develop advanced data center infrastructure, positioning Thailand as a central hub for data services in Southeast Asia.