วันเสาร์ที่ 6 ธันวาคม พ.ศ. 2568

Major types of recommendation systems

Collaborative Filtering (finding similar users/items based on past behavior), Content-Based Filtering (recommending items with similar features to what a user liked), and Hybrid Systems (combining both for better results). Other advanced types include Knowledge-Based, Deep Learning, and Demographic systems, which leverage different data points like item attributes, complex patterns, or user demographics to provide personalized suggestions. 

Collaborative Filtering (CF):
  • Concept: "People who liked X also liked Y." It finds patterns in user-item interactions (ratings, purchases).
  • Sub-types: User-based (If User A buys Item A, and a neighboring User B (who is found to be highly similar to User A based on their shared past interactions like rating or purchases) also buy Item B but User A has not yet bought it, then the system should recommend Item B to User A.) and Item-based (If a user likes item A, and many users who liked item A also liked item B, then the system should recommend item B to that user.).
  • Techniques: Matrix Factorization (like SVD), Nearest Neighbors.
Content-Based Filtering:
    • Concept: Recommends items similar to those a user has liked before, based on item features (e.g., movie genre, director, actors). If a user liked item A, they will also like item B if item B shares many of the same features as item A.
    • How it works: Builds a user profile from features of liked items, then matches it to other items based on similarity. The features can also include those of the user like age and gender.
Hybrid Recommendation Systems:
  • Concept: Merges CF and Content-Based methods, or other techniques, to overcome individual limitations (like the cold-start problem). 
--Gemini

LightFM is a Python implementation of a number of popular recommendation algorithms for both implicit and explicit feedback.
LightFM can solve cold-start problem: For a new user (cold-start user), LightFM uses their provided features (e.g., "age: 25, gender: female") to do content-based filtering to recommend items that share similar features with the user's profile. As the cold-start user interacts with items, the model gradually updates their information, shifting from pure content-based to a more personalized collaborative prediction.

วันพุธที่ 3 ธันวาคม พ.ศ. 2568

วันพุธที่ 26 พฤศจิกายน พ.ศ. 2568

infrastructure as code

 AWS Cloudformation

Terraform (https://developer.hashicorp.com/terraform/tutorials/aws-get-started/infrastructure-as-code)

วันอาทิตย์ที่ 23 พฤศจิกายน พ.ศ. 2568

Google Optimization Tool

OR-Tools is an open source software suite for optimization, tuned for tackling the world's toughest problems in vehicle routing, flows, integer and linear programming, and constraint programming.

https://developers.google.com/optimization

วันอาทิตย์ที่ 9 พฤศจิกายน พ.ศ. 2568

Online learning vs Offline learning

Online learning (or online machine learning) is a method in machine learning where the model continuously learns from a sequential stream of data, updating its parameters incrementally with each new data instance or small batch of data.

It is a dynamic process that allows a model to adapt to new patterns and changes in the data distribution in real-time.

Online learning is the opposite of the more traditional Batch Learning (or Offline Learning) approach.

วันพุธที่ 29 ตุลาคม พ.ศ. 2568

Low code RAG development tool

Workflow automation tool

https://n8n.io/ 

AI builder tool

https://www.langflow.org/

วันพฤหัสบดีที่ 18 กันยายน พ.ศ. 2568

Evaluate RAG

https://towardsdatascience.com/evaluating-your-rag-solution/

Notebooklm is RAG as we can add files to let users ask anything about the files.

วันเสาร์ที่ 13 กันยายน พ.ศ. 2568

PR-AUC

You can use this plot to make an educated decision when it comes to the classic precision/recall dilemma. Obviously, the higher the recall, the lower the precision. Knowing at which recall your precision starts to fall fast can help you choose the threshold and deliver a better model.









https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc

The precision recall curve is a handy plot to showcase the relationship and tradeoff between precision recall values as we adjust the decision threshold of the classifier. What is the decision threshold? The decision threshold, also called the classification threshold, is a cutoff point used in binary classification to convert the probability score output by a machine learning model into a final class prediction (positive or negative). Most binary classification models (like logistic regression) output a probability between 0 and 1 that an instance belongs to the positive class. The decision threshold determines which probability values map to which class: If the predicted probability is greater than or equal to the threshold, the instance is classified as the positive class. If the predicted probability is less than the threshold, the instance is classified as the negative class. How it Works By default, the threshold is often set to 0.5. A probability of \ge 0.5 \rightarrow Positive Class A probability of < 0.5 \rightarrow Negative Class However, this default isn't always optimal. The threshold is a hyperparameter that can be tuned to balance the trade-off between precision and recall, which is what the precision-recall curve helps to visualize. Threshold and Precision/Recall Trade-off Adjusting the decision threshold directly impacts the number of false positives (FP) and false negatives (FN), which in turn changes the precision and recall values.

A higher AUC-PR value signifies better performance, with a maximum value of 1 indicating perfect precision and recall trade-off. https://www.superannotate.com/blog/mean-average-precision-and-its-uses-in-object-detection


วันอังคารที่ 9 กันยายน พ.ศ. 2568

Agentic AI vs AI Agent

The primary difference is that AI Agents are individual tools that execute pre-defined tasks with limited autonomy, while Agentic AI is a broader concept representing the use of autonomous systems that can independently set goals, make real-time decisions, adapt, and collaborate to solve complex, dynamic problems. Think of AI agents as specific tools or employees, and agentic AI as the system or project manager coordinating them to achieve a larger, more complex goal.  

Stochastic Gradient Descent

  • Gradient Descent (Batch): You take a step in the steepest downhill direction. To find the steepest direction, you have to survey the slope of the entire landscape (the entire dataset) before taking each single step. This is accurate but very slow if the landscape is vast (a huge dataset).Stochastic 
  • Gradient Descent (SGD): Instead of surveying the entire landscape, you just pick one random spot on the landscape and measure the slope there. You then take a small step in that single spot's steepest downhill direction. You repeat this process many times, picking a new random spot for each step.

วันอาทิตย์ที่ 7 กันยายน พ.ศ. 2568

Docker getting started for Windows Desktop (not Windows Server)

Terminology (https://www.docker.com/blog/docker-for-web-developers/)

  • Docker Hub: The world’s largest repository of container images, which helps developers and open source contributors find, use, and share their Docker-inspired container images.
  • Docker Compose: A tool for defining and running multi-container applications.
  • Docker Engine: An open source containerization technology for building and containerizing applications.
  • Docker Desktop: Includes the Docker Engine and other open source components; proprietary components; and features such as an intuitive GUI, synchronized file shares, access to cloud resources, debugging features, native host integration, governance, and security features that support Enhanced Container Isolation (ECI), air-gapped containers, and administrative settings management.
  • Docker Build Cloud: A Docker service that lets developers build their container images on a cloud infrastructure that ensures fast builds anywhere for all team members. 

My successful experiment.

1.Download Docker desktop for Windows

https://www.docker.com/products/docker-desktop/

2.Install it

3.You may sign up or skip

4.Run docker.desktop. It shows in the icon tray. 

5.You may be asked to run command wsl --update in cmd to update Windows subsystem for Linux (WSL) then click restart to restart docker engine

6.Create a folder namely "getting-started-docker" anywhere.

7.Within the created folder, create 2 files to get an HTTP server based on Nginx run on your Windows.

Dockerfile

# Use the official Nginx image from Docker Hub

FROM nginx:latest

# Copy the custom index.html file into the Nginx directory

COPY index.html /usr/share/nginx/html/index.html

#If there are multiple files of a web app you want to deploy, use one of these commands instead:

#COPY . /usr/share/nginx/html/ to copy entire current directory

#COPY *.js /usr/share/nginx/html/ to copy all js files

#COPY index.html styles.css scripts/app.js /usr/share/nginx/html/ to copy specified files

index.html

<!DOCTYPE html>

<html>

<head>

<title>Hello, World!</title>

</head>

<body>

<h1>Hello, World!</h1>

<p>This page is served by Nginx in a Docker container.</p>

</body>

</html>

8.Open cmd window. Change directory into getting-started-docker

9.Run the following command to build your container image, which extends nginx image from Docker hub by adding my index.html. The -t specifies the container's name. The . specifies the current directory as a build context.

docker build -t my-nginx-webserver .

10.Start my container by running my docker image.

docker run -d -p 8080:80 --name my-nginx-container my-nginx-webserver

The -d flag runs the container in "detached" mode (in the background).

The -p flag maps port 8080 on your local machine to port 80 inside the container, which is the default port Nginx listens on.

The --name my-nginx-container gives your container a memorable name.

Finally, you specify the name of the image you want to use (my-nginx-webserver).

If you want your container to auto start upon Windows restart, do 2 more steps:

1) Docker desktop>Setting>tick Start Docker Desktop when you sign in to your computer

2) Run docker from within getting-started-docker folder by using this command instead: 

docker run --restart unless-stopped -d -p 8080:80 --name my-nginx-container my-nginx-webserver

11.Open http://localhost:8080 in your web browser to verify if it works.

12.Stop the container with this command:

docker stop my-nginx-container

13.Or you may start it again with this command:

docker start my-nginx-container

14.Remove your container after stopping it:

docker rm my-nginx-container

15.You may remove the image:

docker rmi my-nginx-webserver

วันอาทิตย์ที่ 31 สิงหาคม พ.ศ. 2568

Model overfit

 Overfitting is a common problem in machine learning where a model learns the training data too well, including its noise and random fluctuations, to the point that it fails to make accurate predictions on new, unseen data. It's like a student who memorizes test answers without understanding the underlying concepts; they do well on the practice test (training data) but struggle on the real exam (new data). 🧠

An overfit model has high variance and low bias, meaning it is highly sensitive to the training data and performs poorly when given new information. This contrasts with an underfit model, which is too simple to capture the underlying patterns and performs poorly on both training and new data.

How to Detect and Prevent Overfitting

Detecting overfitting often involves monitoring the model's performance on both a training dataset and a separate validation dataset. A key indicator is when the model's performance on the training data continues to improve (e.g., a decrease in error) while its performance on the validation data begins to worsen.

Here are some common strategies to prevent overfitting:

Use More Data: One of the most effective ways to prevent overfitting is to increase the amount of training data. A larger, more diverse dataset helps the model learn the true patterns rather than memorizing random noise.

Simplify the Model: If a model is too complex for the given data, it's more likely to overfit. You can reduce complexity by using a simpler algorithm or by reducing the number of parameters or features.

Regularization: This technique adds a penalty to the model's loss function based on its complexity. This discourages the model from assigning too much importance to specific features and helps prevent it from becoming overly complex. E.g. L1 Regularization (Lasso)

Early Stopping: During the training process, you can monitor the model's performance on the validation set. If the validation error starts to increase, you can stop the training process early to prevent overfitting.

Cross-Validation: This method involves splitting the data into multiple subsets, or "folds." The model is trained and tested on different combinations of these folds, which helps ensure it's not performing well on just one specific data split.

Dropout Primarily used in neural networks, dropout is a different kind of regularization. During each training iteration, it randomly "drops" a percentage of neurons by temporarily ignoring them. This prevents neurons from becoming too co-dependent and forces the network to learn more robust and generalizable patterns. Early Stopping This technique involves monitoring the model's performance on a separate validation dataset during training. When the performance on the validation set stops improving or begins to get worse, you stop the training process early. This prevents the model from continuing to learn the noise in the training data, which would lead to overfitting. 

--Gemini 

วันเสาร์ที่ 23 สิงหาคม พ.ศ. 2568

Stat vs Math

Math is about discovering and proving truths that are universally valid.

Stat is about drawing conclusions from data, often with uncertainty.

AspectMathematicsStatistics
FocusAbstract concepts, patterns, structuresData collection, analysis, interpretation
NatureDeductive reasoning (from theory to result)Inductive reasoning (from data to inference)
PurposeTo develop theories and solve equationsTo make decisions or predictions based on data
Core ActivitiesProving theorems, solving equationsEstimating, testing hypotheses, modeling data
Key TopicsAlgebra, calculus, geometry, number theoryProbability, sampling, regression, inference

วันอังคารที่ 19 สิงหาคม พ.ศ. 2568

RAG vs ChatGPT with web search

 


วันเสาร์ที่ 16 สิงหาคม พ.ศ. 2568

วันพุธที่ 6 สิงหาคม พ.ศ. 2568

สรรพนามบุรุษที่สองสำหรับตำรวจและทหารบก

ยศทหารบก

  • นายสิบ/ชั้นประทวน:

    • สิบตรี, สิบโท, สิบเอก: นิยมเรียกกันทั่วไปว่า "หมู่"

    • จ่าสิบตรี, จ่าสิบโท, จ่าสิบเอก: นิยมเรียกกันว่า "จ่า"

    • จ่าสิบเอก (อัตราเงินเดือนสูงขึ้น): จะเรียกว่า "จ่าพิเศษ"

  • นายทหารสัญญาบัตร:

    • ร้อยตรี, ร้อยโท: เรียกกันว่า "ผู้หมวด"

    • ร้อยเอก: เรียกกันว่า "ผู้กอง"

    • พันตรี, พันโท, พันเอก: เรียกกันว่า "ผู้พัน"

    • พันเอกพิเศษ, พลตรี, พลโท, พลเอก: เรียกกันว่า "นายพล"

    • ตำแหน่งผู้บังคับการกรมขึ้นไป: จะนิยมเรียกกันว่า "ผู้การ"

ยศตำรวจ

  • ชั้นประทวน:

    • สิบตำรวจตรี, สิบตำรวจโท, สิบตำรวจเอก: เรียกกันว่า "หมู่"

    • จ่าสิบตำรวจ: เรียกกันว่า "จ่า"

    • ดาบตำรวจ: เรียกกันว่า "ดาบ"

  • ชั้นสัญญาบัตร:

    • ร้อยตำรวจตรี, ร้อยตำรวจโท: เรียกกันว่า "ผู้หมวด"

    • ร้อยตำรวจเอก: เรียกกันว่า "ผู้กอง"

    • พันตำรวจตรี, พันตำรวจโท, พันตำรวจเอก: เรียกกันว่า "ผู้พัน"

    • พลตำรวจตรี, พลตำรวจโท, พลตำรวจเอก: ตำแหน่งนี้จะเรียกกันว่า "นายพล"

    • ตำแหน่งผู้บังคับการกองบังคับการขึ้นไป (ตั้งแต่ พล.ต.ต. ขึ้นไป): นิยมเรียกกันว่า "ผู้การ"

    • สารวัตร: เป็นชื่อตำแหน่ง ไม่ใช่ชื่อยศ โดยสารวัตรส่วนใหญ่จะเป็นยศ พันตำรวจตรี