Dr.Jiw: 2026

วันอาทิตย์ที่ 12 กรกฎาคม พ.ศ. 2569

Grafana (https://newrelic.com/lp/grafana-monitoring?utm_medium=cpc&utm_source=google&utm_campaign=EVER-GREEN_NB_SEARCH_GRAFANA_APAC_APAC_EN&utm_network=g&utm_keyword=grafana&utm_device=c&_bt=591874843669&_bm=e&_bn=g&l5_source=googleads&l5_cid=10702646826&l5_adid=591874843669&mkt_network=g&adgroup=grafana&hstk_creative=591874843669&hstk_campaign=10702646826&hstk_network=googleAds&gad_source=1&gad_campaignid=10702646826&gbraid=0AAAAADln4Y-M336XavYi3coXe4PR8THfO&gclid=CjwKCAjw9szSBhBNEiwAC57SqyMl6-5zQtzhk6fyo1uCgHpVCzE1l4p8oszVIq-9niefsEZK0wkmbhoCTtEQAvD_BwE)

Librenms (https://www.librenms.org/)

วันเสาร์ที่ 11 กรกฎาคม พ.ศ. 2569

เศรษฐศาสตร์ vs บัญชี

ในทางปฏิบัติและหลักการพื้นฐาน ทั้งสองศาสตร์นี้มีมุมมองและการปฏิบัติเกี่ยวกับ "ค่าเสียโอกาส" (Opportunity Cost) ที่แตกต่างกันอย่างสิ้นเชิง

เหตุผลที่ทั้งสองศาสตร์มองเรื่องนี้ต่างกัน สามารถสรุปได้ดังนี้ครับ:

1. ศาสตร์การบัญชี (Accounting)

การบัญชีเน้นการบันทึกข้อมูลที่ เกิดขึ้นจริง มีหลักฐานเชิงประจักษ์ และสามารถวัดมูลค่าเป็นตัวเงินได้อย่างเที่ยงตรง (Objective and Verifiable) เพื่อรายงานผลประกอบการและฐานะทางการเงินแก่บุคคลภายนอก (เช่น สรรพากร ผู้ถือหุ้น ธนาคาร)

ไม่บันทึกค่าเสียโอกาส: เนื่องจากค่าเสียโอกาสเป็นเพียง "ทางเลือกที่ไม่ได้เลือก" จึงไม่มีเม็ดเงินไหลเข้าหรือไหลออกจากบริษัทจริง ไม่มีใบเสร็จ หรือหลักฐานการทำธุรกรรม

เน้น "ต้นทุนทางบัญชี" (Accounting Cost): หรือที่เรียกว่า ต้นทุนที่จ่ายจริง (Explicit Cost) เช่น ค่าแรง ค่าวัตถุดิบ ค่าเช่า ค่าน้ำไฟ

2. ศาสตร์เศรษฐศาสตร์ (Economics)

เศรษฐศาสตร์เป็นศาสตร์ที่ว่าด้วย การจัดสรรทรัพยากรที่มีอยู่อย่างจำกัดให้เกิดประโยชน์สูงสุด นักเศรษฐศาสตร์มองว่า ทุกครั้งที่เราเลือกทำสิ่งใดสิ่งหนึ่ง เราได้ตัดสินใจ "สละ" อีกสิ่งหนึ่งไปเสมอ

คิดค่าเสียโอกาสเป็นทุนเสมอ: ในทางเศรษฐศาสตร์ ต้นทุนรวม (Economic Cost) จะประกอบด้วย: explicit cost plus implicit cost

เป้าหมายเพื่อการตัดสินใจ: การคิดค่าเสียโอกาสช่วยให้มองเห็นภาพรวมว่า ทรัพยากร (เงิน เวลา แรงงาน) ถูกใช้ไปอย่างคุ้มค่าที่สุดแล้วหรือยังเมื่อเทียบกับทางเลือกอื่น

สมมติว่าคุณลาออกจากงานประจำที่ได้เงินเดือน 50,000 บาท เพื่อมาเปิดร้านกาแฟของตัวเอง โดยมีค่าใช้จ่ายในการดำเนินงาน (ค่าเช่า เมล็ดกาแฟ ค่าไฟ) เดือนละ 40,000 บาท และร้านทำรายได้เดือนละ 80,000 บาท

มุมมองทางบัญชี: * รายได้ 80,000 - ค่าใช้จ่ายจริง 40,000 = กำไรทางบัญชี 40,000 บาท (ถือว่ามีกำไร)

มุมมองทางเศรษฐศาสตร์: * นอกจากค่าใช้จ่ายจริง 40,000 บาทแล้ว ต้องบวก ค่าเสียโอกาส (เงินเดือนที่ยอมสละมา) อีก 50,000 บาท รวมเป็นต้นทุนทางเศรษฐศาสตร์ 90,000 บาท

รายได้ 80,000 - ต้นทุนรวม 90,000 = ขาดทุนทางเศรษฐศาสตร์ 10,000 บาท (แปลว่าการอยู่ที่เดิมคุ้มค่ากว่าในเชิงตัวเงิน)

📌 ข้อยกเว้นในทางปฏิบัติ (การบัญชีบริหาร)

แม้ว่าในการบัญชีการเงิน (Financial Accounting) ที่ใช้ทำงบส่งสรรพากรจะไม่คิดค่าเสียโอกาสเลย แต่ใน "การบัญชีบริหาร" (Managerial Accounting) ซึ่งเป็นการบัญชีที่ทำขึ้นภายในบริษัทเพื่อให้ผู้บริหารใช้ตัดสินใจเลือกโครงการ หรือวางแผนกลยุทธ์ นักบัญชีบริหารจะนำ ค่าเสียโอกาส มาคำนวณร่วมด้วยในลักษณะเดียวกับเศรษฐศาสตร์ครับ เพื่อไม่ให้ผู้บริหารตัดสินใจผิดพลาด

การวิจัยอิสระในญี่ปุ่น

ประเทศญี่ปุ่นให้ความสำคัญกับการปูพื้นฐานกระบวนการคิดเป็นระบบและการทำวิจัยตั้งแต่ระดับประถมศึกษา โดยไม่ได้มาในรูปแบบของการเขียนเล่มวิทยานิพนธ์หนาๆ แบบมหาวิทยาลัย แต่จะสอดแทรกอยู่ในวัฒนธรรมการศึกษาผ่านกิจกรรมที่เรียกว่า "จิยู เค็งคิว" (Jiyuu Kenkyuu - 自由研究) หรือแปลตรงตัวว่า "การวิจัยอิสระ"

กิจกรรมนี้เป็น การบ้านภาคฤดูร้อน ภาคบังคับของเด็กประถมส่วนใหญ่ในญี่ปุ่นที่มีมานานหลายทศวรรษ โดยมีหัวใจสำคัญและรูปแบบการเรียนรู้ดังนี้ครับ

1. อิสระในการเลือกหัวข้อจากความสงสัยรอบตัว

โรงเรียนจะไม่กำหนดหัวข้อ เด็กๆ สามารถเลือกทำเรื่องอะไรก็ได้ที่ตัวเองสนใจจริงๆ ตั้งแต่เรื่องวิทยาศาสตร์ ประวัติศาสตร์ ทัศนศิลป์ ไปจนถึงพฤติกรรมศาสตร์ เช่น:

ทำไมมดถึงชอบกินน้ำตาลมากกว่าน้ำผึ้ง? (ตั้งสมมติฐาน วางเหยื่อล่อ เฝ้าสังเกต และบันทึกผล)
หยดน้ำบนแก้วน้ำแข็งมาจากไหน?
การเปรียบเทียบราคาและปริมาณของขนมประเภทเดียวกันในร้านสะดวกซื้อต่างๆ
การตามรอยประวัติศาสตร์ของรถไฟในชุมชนตัวเอง

2. ฝึกกระบวนการทางวิทยาศาสตร์ (Scientific Method)

แม้จะเป็นเด็กเล็ก แต่กระบวนการที่โรงเรียนสอนให้ใช้คือโครงสร้างเดียวกับการทำวิจัยของผู้ใหญ่:

การตั้งคำถาม: เกิดจากความสงสัยในชีวิตประจำวัน
การคาดคะเน (สมมติฐาน): คิดว่าผลลัพธ์น่าจะเป็นอย่างไร
การทดลอง/เก็บข้อมูล: การลงมือทำจริง ถ่ายรูป วาดภาพ บันทึกสถิติ
สรุปผล: สิ่งที่ได้เรียนรู้ตรงกับที่คาดไว้ไหม เพราะอะไร

3. การนำเสนอผลงาน (Data Visualization)

หลังเปิดเทอม เด็กๆ จะต้องนำผลงานวิจัยของตนเองมาจัดแสดงในห้องเรียนหรือโถงโรงเรียน ส่วนใหญ่จะสรุปสลักลงบนกระดาษแผ่นใหญ่ (กระดาษปรู๊ฟ) ตกแต่งด้วยภาพวาด ภาพถ่าย และกราฟเส้นง่ายๆ เพื่อฝึกทักษะการสื่อสารและการถ่ายทอดข้อมูลให้ผู้อื่นเข้าใจ

วันศุกร์ที่ 10 กรกฎาคม พ.ศ. 2569

Leave-One-Out Cross-Validation

Leave-One-Out Cross-Validation (LOOCV) is a specific type of $K$ -fold cross-validation where the number of folds ( $K$ ) equals the total number of data points ( $N$ ) in your dataset.

Here is how it works in a nutshell:

The Process: If you have $N$ data points, you train your model $N$ times. In each iteration, you "leave out" exactly one data point to use as the test set and train the model on the remaining $N-1$ data points.
The Evaluation: You record the model's error on that single held-out point. After rotating through all data points, you average the $N$ individual errors to get the final validation score.

OpenStreetMap

A free, editable global map created and maintained by a community of volunteers. Think of it as the "Wikipedia for maps". Instead of relying on corporate or government databases, it is crowdsourced and lets anyone with an internet connection add or update local geographic data, such as roads, trails, and businesses.

https://www.openstreetmap.org/#map=5/13.15/101.49

วันพฤหัสบดีที่ 9 กรกฎาคม พ.ศ. 2569

Akamai vs Cloudflare vs Amazon Cloudfront

Cloudflare is subscription based thus not cloud based CDN

Akamai and Cloudfront is pay per use charging data out, http requests, and for Akamai additionally storage, thus cloud based CDN.

วันพุธที่ 8 กรกฎาคม พ.ศ. 2569

Quantum annealing vs gates

Quantum annealing (QA) is a specialized quantum computing method used primarily to solve complex optimization problems.

Instead of using logic gates to perform step-by-step calculations (like gate-based quantum computers from IBM or Google), a quantum annealer leverages the natural tendency of quantum physics to find the lowest-energy state of a system.

CUDA-Q

CUDA-Q (formerly known as CUDA Quantum) is an open-source, hybrid quantum-classical computing platform developed by NVIDIA. It acts as a bridge, allowing developers to program and run quantum algorithms seamlessly alongside traditional classical computing resources—specifically GPUs.

Here is how they intersect and why the GPU is vital to the quantum ecosystem today.

https://developer.nvidia.com/cuda-q?size=n_6_n&sort-field=featured&sort-direction=desc

วันเสาร์ที่ 4 กรกฎาคม พ.ศ. 2569

Adam optimization

Adam (short for Adaptive Moment Estimation) is one of the most popular optimization algorithms used in deep learning. It is essentially an advanced version of Stochastic Gradient Descent (SGD) that adapts the learning rate for each parameter individually based on past information.

By combining the advantages of two other extensions of SGD—Momentum and RMSProp—it achieves faster convergence and is generally robust to different types of neural network architectures.

Adam (short for **Adaptive Moment Estimation**) is one of the most popular optimization algorithms used in deep learning. It is essentially an advanced version of Stochastic Gradient Descent (SGD) that adapts the learning rate for each parameter individually based on past information.

By combining the advantages of two other extensions of SGD—**Momentum** and **RMSProp**—it achieves faster convergence and is generally robust to different types of neural network architectures.

---

### How Adam Works

Adam keeps track of two "moments" (moving averages) of the gradients for each parameter in the network:

1. **The First Moment ($m_t$):** This is the moving average of the *gradients*. It acts like momentum, helping the optimizer roll past local minima or noisy gradients.

2. **The Second Moment ($v_t$):** This is the moving average of the *squared gradients*. This tracks the uncentered variance of the gradients, allowing the algorithm to scale the learning rate based on how much a parameter's gradient fluctuates.

#### The Update Steps

The algorithm follows these steps at each time step ($t$):

* **Calculate Gradients ($g_t$):** Compute the gradient of the loss function with respect to the parameters.

* **Update Moving Averages:**

* $m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t$

* $v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2$

* **Bias Correction:** Since the moving averages are initialized at zero, they are biased toward zero at the start of training. Adam corrects this:

* $\hat{m}_t = \frac{m_t}{1 - \beta_1^t}$

* $\hat{v}_t = \frac{v_t}{1 - \beta_2^t}$

* **Parameter Update:** Finally, the weights are updated:

* $\theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{\hat{v}_t} + \epsilon} \cdot \hat{m}_t$

---

### Why Adam is Effective

* **Adaptive Learning Rates:** By dividing the gradient by the square root of the second moment ($\sqrt{v_t}$), Adam automatically shrinks the effective learning rate for parameters with large, volatile gradients and increases it for those with small, infrequent gradients.

* **Momentum:** The first moment allows the model to "gain speed" in directions that consistently reduce the loss, preventing it from getting stuck in small plateaus.

* **Efficiency:** It requires very little memory and works well even with sparse data or non-stationary objectives.

### Hyperparameters

When using Adam, you generally don't need to tune the hyperparameters extensively, but they are:

* **$\eta$ (Learning Rate):** Usually starts at $0.001$.

* **$\beta_1$ (Momentum decay):** Typically $0.9$.

* **$\beta_2$ (Second moment decay):** Typically $0.999$.

* **$\epsilon$ (Smoothing term):** A tiny constant (e.g., $10^{-8}$) to prevent division by zero.

---

### Comparison Summary

| --- | --- | --- | --- |

| **Momentum** | Optional | No | **Yes** |

| **Adaptive Learning Rate** | No | Yes | **Yes** |

Would you like to explore how Adam compares to other modern optimizers like AdaGrad or AdamW, or perhaps look at how to implement it in a framework like PyTorch or TensorFlow?

วันพฤหัสบดีที่ 2 กรกฎาคม พ.ศ. 2569

MCP

MCP stands for Model Context Protocol. It is an open-source standard created by Anthropic that allows AI assistants (like Claude) to seamlessly connect to external data sources, applications, and tools. It allows AI models to dynamically discover tools, read resources (like files or database records), and interact with them in real time.

MCP Clients are AI-powered applications (like Claude Desktop or an IDE) that want to fetch context.

MCP Servers are lightweight programs that hook into external tools (like GitHub, Figma, or a local database) to share that context with the AI.

วันอาทิตย์ที่ 28 มิถุนายน พ.ศ. 2569

Techniques for public accessing to private-IP servers

Peer-to-peer overlay network Services

ZeroTier, Tailscale are not SSL VPN. They use UDP hole punching.

UDP hole punching is a technique that allows two devices behind NAT routers (such as home routers) to establish a direct peer-to-peer connection without requiring manual port forwarding.

Here’s how it works:

Both devices contact a public coordination server

Suppose Device A is at your home and Device B is in another organization.
Both devices first communicate with a publicly reachable server operated by the VPN service (e.g., ZeroTier).

The coordination server learns their public addresses

The server observes the public IP address and UDP port assigned by each device’s NAT router.

The server tells each device how to reach the other

Device A learns Device B’s public IP and port, and vice versa.

Both devices simultaneously send UDP packets to each other

When Device A sends a packet to Device B, its NAT router creates a temporary mapping (a “hole”) allowing return traffic.
Device B does the same.
Because both sides have opened these temporary holes, the packets can pass through the NATs, establishing a direct connection.

Device A ── NAT A ── Internet ── NAT B ── Device B

↑ ↑

└───── simultaneous UDP packets ─────────┘

Why is it called “hole punching”?

Normally, NAT routers block unsolicited incoming packets. By sending outgoing UDP packets first, each device creates a temporary opening (“hole”) in its NAT table that allows packets from the other device to enter.

Advantages

No need to configure port forwarding on routers.
Enables direct peer-to-peer communication.
Lower latency than relaying traffic through a central server.

Limitations

UDP hole punching does not work with all NAT types. It usually succeeds with:

Full-cone NAT
Restricted-cone NAT
Port-restricted cone NAT

It may fail with:

Symmetric NAT (common in some enterprise networks and cellular networks)

When hole punching fails, services such as ZeroTier, Tailscale, and WebRTC applications often fall back to relaying traffic through intermediary servers.

The coordination protocol commonly used to discover public addresses is based on the STUN standard, while relay fallback often uses TURN servers.

2. Cloudflare tunnels

It’s a free service requiring registered DNS name. Cloudflare Tunnel: does not use hole punching. It relies on the private server maintaining a long-lived outbound connection to Cloudflare edge server.

วันเสาร์ที่ 20 มิถุนายน พ.ศ. 2569

CNN based on spatiotemporal features

CRNN (Convolutional Recurrent Neural Network) and STGCN (Spatio-Temporal Graph Convolutional Network) are both deep learning architectures used to process spatio-temporal data (like videos or time-series networks). The main difference is how they model space: CRNNs treat spatial features as an image grid (using CNNs), while STGCNs treat space as an interconnected topology of specific points (using Graph Neural Networks).

CRNN (Convolutional Recurrent Neural Network)

CRNNs combine Convolutional Neural Networks (CNNs) for spatial feature extraction with Recurrent Neural Networks (RNNs) like LSTMs or GRUs for sequence processing.

Spatial Processing: Applies 2D or 3D CNNs to extract abstract feature representations from regular grid data (like video frames or pixels).
Temporal Processing: Uses recurrent memory cells to capture dependencies over time.
Common Use Cases: Video classification, image captioning, optical character recognition (OCR), and audio/speech recognition.

STGCN (Spatio-Temporal Graph Convolutional Network)

STGCNs apply Graph Convolutional Networks (GCNs) to handle non-Euclidean spatial data and pair them with Temporal Convolutional Networks (TCNs) or similar operations for the time domain.

Spatial Processing: Treats data as a graph where entities are "nodes" and relationships are "edges" (e.g., tracking human skeleton joints like hands and knees).
Temporal Processing: Processes time sequences in parallel or via temporal convolutions rather than looping sequentially through an RNN.
Common Use Cases: Human action recognition using skeleton poses, traffic forecasting, and traffic-flow modeling.

วันพฤหัสบดีที่ 18 มิถุนายน พ.ศ. 2569

Optimization of model parameters vs hyper parameters

## 1. Model Parameter Optimization Methods

These methods are the actual **optimizing algorithms** that update the internal weights (w) and biases (b) of a model during the training phase based on the calculated gradients.

### First-Order Optimization (Gradient-Based)

* **Stochastic Gradient Descent (SGD):** The foundational method. It calculates the gradient of the loss function for a small batch (or a single sample) and takes a step in the direction of the steepest descent.

* **Momentum:** An extension of SGD that accelerates the optimization by adding a fraction of the previous step's update vector. This helps "roll" past local minima and dampens oscillations.

* **Adam (Adaptive Moment Estimation):** The current industry standard for deep learning. It computes adaptive learning rates for each individual parameter by tracking both the first moment (the mean) and the second moment (the uncentered variance) of the gradients.

### Second-Order Optimization (Curvature-Based)

* **L-BFGS (Limited-memory Broyden–Fletcher–Goldfarb–Shanno):** A quasi-Newton method that estimates the Hessian matrix (the second derivative of the loss function). It is computationally heavy but highly effective for smaller datasets and traditional algorithms like logistic regression or CRFs.

## 2. Hyperparameter Optimization (HPO) Methods

These are the macro-level strategies used to search for the best external configurations (e.g., finding the best learning rate, number of layers, or dropout rate) *before* the inner parameter training loop begins.

### Traditional/Exhaustive Search

* **Grid Search:** As discussed, it performs an exhaustive search over a manually specified grid of discrete values.

* *Example:* Testing every combination of learning rates [0.1, 0.01] and batch sizes [32, 64].

* **Random Search:** Instead of checking every single point on a grid, it randomly samples configurations from a specified statistical distribution over a fixed number of iterations. It is mathematically proven to be more efficient than grid search because it doesn't waste time evaluating unimportant hyperparameters.

### Informed/Sequential Search

* **Bayesian Optimization:** A smart, sequential strategy. It builds a probabilistic model (a "surrogate model," often using Gaussian Processes) of the objective function based on past evaluation results. It uses this model to mathematically predict which hyperparameter combination is most promising to try next, balancing exploration and exploitation.

### Heuristic & Evolutionary Algorithms

* **Genetic Algorithms (GA):** A population of hyperparameter sets is initialized. The best-performing sets are selected to "reproduce" (combine metrics) and undergo random "mutation" to create the next generation of hyperparameters.

### Early-Stopping Based Methods

* **Hyperband:** An advanced variation of random search that uses a "successive halving" approach. It starts many training runs with random configurations simultaneously but only allocates a tiny resource budget (e.g., a few epochs) to them initially. It aggressively terminates poor performers early and funnels the remaining training budget into the most promising setups.

### Summary of the Workflow Hierarchy

```

[ Hyperparameter Optimization (e.g., Bayesian Optimization) ]

│

▼ Chooses a setup (e.g., Learning Rate = 0.001)

│

┌───┴───────────────────────────────────────────┐

│ Inner Loop: Training Phase │

│ │

│ [ Model Parameter Optimization (e.g., Adam) ] │

│ │ │

│ ▼ Updates Weights and Biases │

│ (Minimizes Loss Function on Data) │

└───────────────────────────────────────────────┘

```

วันเสาร์ที่ 13 มิถุนายน พ.ศ. 2569

Active Learning

คือ กระบวนการจัดการเรียนรู้เชิงรุกที่เน้นให้ผู้เรียนมีส่วนร่วม ลงมือปฏิบัติจริง และคิดวิเคราะห์ด้วยตนเอง เปลี่ยนจากการเป็นผู้รับสาร (นั่งฟังครูสอนเพียงอย่างเดียว) มาเป็นผู้สร้างองค์ความรู้ผ่านกิจกรรมต่างๆ เช่น การระดมสมอง การทำโครงงาน และการอภิปราย

สามารถแบ่งความเข้าใจในแนวคิดนี้ออกเป็น 3 ส่วนหลัก ดังนี้ครับ:

🌟 บทบาทที่เปลี่ยนไป

ผู้เรียน: เป็นศูนย์กลางของการเรียนรู้ มีหน้าที่คิด วิเคราะห์ ลงมือทำ และแลกเปลี่ยนความคิดเห็นกับเพื่อน
ผู้สอน: เปลี่ยนจาก "ผู้บรรยาย" มาเป็น "ผู้อำนวยความสะดวก" (Facilitator) หรือโค้ช คอยให้คำปรึกษาและสร้างแรงบันดาลใจ

💡 ตัวอย่างรูปแบบกิจกรรม

การเรียนรู้แบบใช้ปัญหาเป็นฐาน (Problem-Based Learning - PBL): ให้ผู้เรียนร่วมกันแก้โจทย์ปัญหาหรือสถานการณ์จำลอง

การระดมสมอง (Brainstorming): แลกเปลี่ยนความคิดเห็นอย่างอิสระเพื่อหาคำตอบร่วมกัน

การเรียนรู้แบบร่วมมือ (Cooperative Learning): แบ่งกลุ่มทำงาน ทำโครงงาน หรือจับคู่ทบทวนความรู้ (Think-Pair-Share)

การจำลองสถานการณ์ (Role-playing): สวมบทบาทสมมติเพื่อทำความเข้าใจเนื้อหาหรือผลกระทบต่างๆ อย่างลึกซึ้ง

🎯 ประโยชน์ของการเรียนรู้

ช่วยให้ผู้เรียนจดจำเนื้อหาได้ยาวนานขึ้น (เพราะได้ลงมือทำจริง)
พัฒนาทักษะการคิดขั้นสูง เช่น การวิเคราะห์ การสังเคราะห์ และการแก้ปัญหา
ส่งเสริมทักษะทางสังคม เช่น การทำงานร่วมกับผู้อื่นและการสื่อสาร