Dr.Jiw: สิงหาคม 2024

วันพฤหัสบดีที่ 29 สิงหาคม พ.ศ. 2567

ฟ้าลิขิตไม่มี

การเชื่อฟ้าลิขิตว่าทุกการกระทำเราเกิดจากวิบากกรรมในอดีตคือการเชื่อในกรรมบันดาล

แต่พระพุทธเจ้าตรัสว่ากรรมไม่ได้เกิดจากตนเองบันดาลไม่ได้เกิดจากผู้อื่นบันดาลไม่ได้เกิดจากทั้งตนเองและผู้อื่นบันดาลและไม่ได้เกิดขึ้นเองลอยๆแต่การเป็นไปตามปฏิจจสมุปบาท

An approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a single customer view (or single view of any other entity) of the overall data.

Unlike the traditional extract, transform, load ("ETL") process, the data remains in place, and real-time access is given to the source system for the data. This reduces the risk of data errors, of the workload moving data around that may never be used, and it does not attempt to impose a single data model on the data (an example of heterogeneous data is a federated database system). The technology also supports the writing of transaction data updates back to the source systems.

สรุปว่าต่างจาก Data warehouse ทีไม่ต้อง replicate ข้อมูลมารวมใน DW (i.e. ทำแค่ E&T ไม่ทำ L) และยังสามารถ write back to original DBs ได้ด้วย คล้ายๆกับแนวคิดของ Oracle VIEW

วันพุธที่ 28 สิงหาคม พ.ศ. 2567

AI tools

https://storm.genie.stanford.edu/ generate a Wikipedia-like report on your topic

STORM is a research prototype for automating the knowledge curation process

วันจันทร์ที่ 19 สิงหาคม พ.ศ. 2567

How does ChatGPT work?

ChatGPT consists of transformer (converting input text sequence to output text sequence e.g. by translation using dictionary), LLM (to predict a subsequent word give previous words), etc. It relies on supervised and reinforcement learning techniques.

https://novaapp.ai/blog/technology-behind-chatgpt

https://medium.com/@ashish.sharma1981/chatgpt-architecture-exploring-the-inner-workings-of-the-language-model-41731fc05483

https://www.scalablepath.com/machine-learning/chatgpt-architecture-explained

https://youtu.be/lm_ZBWaK56k?si=I8P_btvKiCFwFL5G

วันพฤหัสบดีที่ 15 สิงหาคม พ.ศ. 2567

Akaike Information Critera (AIC)

Akaike Information Critera (AIC) is a widely used measure of a statistical model. It basically quantifies 1) the goodness of fit, and 2) the simplicity/parsimony, of the model into a single statistic.

Cf. https://coolstatsblog.com/2013/08/14/using-aic-to-test-arima-models-2/

สุภาษิตญีปุ่น "ความต่อเนื่องคือพลัง"

「継続は力なり」 (Keizoku wa chikara nari) หมายความว่าความมุ่งมั่นและการทำสิ่งต่างๆ อย่างต่อเนื่องสามารถสร้างพลังและความสำเร็จได้ ซึ่งเป็นความคิดที่ได้รับการยอมรับในวัฒนธรรมญี่ปุ่น

วันจันทร์ที่ 12 สิงหาคม พ.ศ. 2567

Genetic algorithm and Evolutionary computing

Genetic Algorithms (GAs) are a core component of evolutionary computing, which is a broader field inspired by the principles of natural evolution. Here's how GAs fit into evolutionary computing:

Evolutionary Computing:

Definition:
- Evolutionary computing is a class of optimization algorithms inspired by the principles of biological evolution. It includes a variety of techniques that mimic natural selection, genetic processes, and evolution to solve complex optimization problems.
Core Concepts:
- Natural Selection: The process where organisms better adapted to their environment tend to survive and produce more offspring.
- Genetic Operators: Techniques such as crossover (recombination) and mutation that simulate biological evolution.

Genetic Algorithms (GAs):

Definition:
- Genetic Algorithms are a specific type of evolutionary algorithm used to find approximate solutions to optimization and search problems. They mimic the process of natural evolution to evolve solutions over generations.
Components of GAs:
- Population: A set of potential solutions to the problem, each represented as a chromosome or individual.
- Selection: A process to choose individuals based on their fitness scores, favoring better solutions for reproduction.
- Crossover (Recombination): Combines parts of two parent solutions to create new offspring, simulating biological reproduction.
- Mutation: Introduces random changes to offspring to maintain genetic diversity and explore new areas of the solution space.
- Fitness Function: Evaluates how well a solution solves the problem, guiding the selection process.
Relation to Evolutionary Computing:
- Foundational Role: GAs are one of the earliest and most well-known examples of evolutionary algorithms, demonstrating the core principles of evolutionary computing.
- Broad Category: Evolutionary computing includes other algorithms as well, such as Evolution Strategies (ES), Evolutionary Programming (EP), and Genetic Programming (GP), each with variations on the evolutionary concepts.

In summary, Genetic Algorithms are a specific instantiation of evolutionary computing principles, and they play a significant role in demonstrating how evolutionary concepts can be applied to optimization and search problems.

Well-known optimization methods

Exact Optimization Methods:

Linear Programming (LP): Optimizes a linear objective function subject to linear equality and inequality constraints.
Integer Programming (IP): A type of linear programming where some or all of the decision variables are required to be integers.
Quadratic Programming (QP): Optimizes a quadratic objective function subject to linear constraints.
Dynamic Programming (DP): Solves problems by breaking them down into simpler subproblems and solving each subproblem just once, storing the results.
Branch and Bound: An algorithm for solving integer programming problems by systematically exploring and pruning the solution space.

Gradient-Based Methods:

Gradient Descent: Iteratively moves towards the minimum of a function by following the negative gradient.
Newton’s Method: Uses second-order derivative information (Hessian matrix) to find the roots of a function or the minima of a function more efficiently.
Conjugate Gradient Method: An iterative method for solving large systems of linear equations and optimization problems, especially useful for quadratic functions.

Heuristic and Metaheuristic Methods:

Genetic Algorithms (GA): Uses natural selection principles to evolve solutions over generations.
Simulated Annealing (SA): Uses a probabilistic approach to avoid local optima by mimicking the annealing process in metallurgy.
Particle Swarm Optimization (PSO): Models the social behavior of swarms to find optimal solutions.
Ant Colony Optimization (ACO): Simulates the foraging behavior of ants to find optimal paths or solutions.
Tabu Search: Uses memory structures to guide the search and avoid local optima.
Differential Evolution (DE): Uses mutation and recombination to evolve solutions towards the optimal.

Approximation Algorithms:

Greedy Algorithms: Make a series of locally optimal choices to find a solution that is hopefully globally optimal.
Local Search: Iteratively explores the neighborhood of a solution to find better solutions, such as in the case of Hill Climbing.

Stochastic and Hybrid Methods:

Bayesian Optimization: Uses probabilistic models to guide the search for optimal solutions, often used for expensive-to-evaluate functions.
Memetic Algorithms: Combines genetic algorithms with local search methods to refine solutions.

Swarm Optimization Algorithms:

1. Particle Swarm Optimization (PSO)
2. Ant Colony Optimization (ACO)
3. Artificial Bee Colony (ABC)
4. Grey Wolf Optimizer (GWO)
5. Firefly Algorithm (FA)

Algorithm-Agnostic Model Building with Mlflow

Agnostic = irrelevant

Creating generic ML pipelines using mlflow.pyfunc

https://towardsdatascience.com/algorithm-agnostic-model-building-with-mlflow-b106a5a29535

วันพฤหัสบดีที่ 8 สิงหาคม พ.ศ. 2567

Coercive citation

During the peer review process, or when authors have their work provisionally accepted for publication, they may encounter instances where Handling Editors or peer reviewers ask them to consider citing additional sources to ensure a more comprehensive discussion. These references may include papers published in the same journal. World Scientific strongly opposes the practice of demanding authors to include references solely to boost citation numbers without any scientific justification, commonly known as "coercive citation".

Mutated researchers นักวิจัยกลายพันธุ์

Researcher with a changing focus or area of expertise.

Otherwise:

Cross-disciplinary researcher

Interdisciplinary researcher

Researcher with multiple expertise

Mentor

<> Mentee

วันพุธที่ 7 สิงหาคม พ.ศ. 2567

World of NLP

https://kinoshita.eti.br/2017/06/03/natural-language-processing-and-natural-language-understanding.html

https://en.m.wikipedia.org/wiki/Natural_language_processing

Feature extaction aka word embedding includes vectorization using Bag of word, word2vec, TF-IDF.

NLP model developing steps by ChatGPT:

To develop an NLP model using the terms provided, the process generally follows these steps:

1. **Data Collection**: Gather and prepare a dataset of text that will be used for training and testing the NLP model.

2. **Tokenization**:

- **Explanation**: Split the text into smaller units called tokens, which can be words, subwords, or characters.

- **Purpose**: Tokenization allows the model to process and analyze text at a granular level.

- **Example**: "The cat sat on the mat." becomes ["The", "cat", "sat", "on", "the", "mat"].

3. **Stop Words Removal**:

- **Explanation**: Remove common words that have little meaning on their own, such as "the," "is," and "and."

- **Purpose**: Reduce noise in the data, focusing the model on more meaningful words.

- **Example**: After removal, ["The", "cat", "sat", "on", "the", "mat"] might become ["cat", "sat", "mat"].

4. **Stemming**:

- **Explanation**: Reduce words to their root form by removing suffixes (e.g., "running" → "run").

- **Purpose**: Simplify words to a common base form, reducing vocabulary size.

- **Example**: "running", "runner", "ran" all stem to "run".

5. **Lemmatization**:

- **Explanation**: Similar to stemming, but it reduces words to their base or dictionary form, known as the lemma, considering the context.

- **Purpose**: Ensure words are reduced to their meaningful base form, which may differ based on context.

- **Example**: "better" → "good", "running" → "run".

6. **Feature Extraction**:

- **Explanation**: Convert tokens into numerical features that the model can understand.

- **Methods**:

- **Bag of Words (BoW)**: Represents text by the frequency of words in the document.

- **TF-IDF (Term Frequency-Inverse Document Frequency)**: Adjusts the frequency of words by their importance across documents. (https://bdi.or.th/big-data-101/tf-idf-1/)

- **Word Embedding**: An advanced method that transforms words into dense vectors capturing semantic relationships between words. Common methods include Word2Vec, GloVe, and FastText.

- **Purpose**: Transform text data into a format suitable for modeling, whether through simple frequency counts or more complex vector representations.

7. **Modeling with Deep Learning Algorithms**:

- **Explanation**: Use deep learning techniques to build the NLP model.

- **Purpose**: Leverage complex neural networks to capture patterns and relationships in text data.

- **Common Models**:

- **RNN (Recurrent Neural Network)**: Suitable for sequence-based tasks like text generation.

- **LSTM (Long Short-Term Memory)**: An advanced form of RNN that handles long-term dependencies.

- **Transformer**: State-of-the-art model architecture for NLP tasks (e.g., BERT, GPT).

8. **Model Training**:

- **Explanation**: Train the deep learning model using the processed text data.

- **Purpose**: Optimize model parameters to minimize error and improve accuracy.

9. **Evaluation**:

- **Explanation**: Assess the model's performance on a validation set.

- **Purpose**: Ensure the model generalizes well to unseen data.

10. **Deployment**:

- **Explanation**: Integrate the trained model into a production environment.

- **Purpose**: Make the model available for practical use.

11. **Monitoring and Maintenance**:

- **Explanation**: Continuously monitor the model's performance and update it as needed.

- **Purpose**: Ensure the model remains accurate and relevant over time

Example implementation of ChatBot using LSTM

https://medium.com/@newnoi/%E0%B8%A1%E0%B8%B2%E0%B8%AA%E0%B8%A3%E0%B9%89%E0%B8%B2%E0%B8%87-chatbot-%E0%B9%81%E0%B8%9A%E0%B8%9A%E0%B9%84%E0%B8%97%E0%B8%A2%E0%B9%86-%E0%B8%94%E0%B9%89%E0%B8%A7%E0%B8%A2-machine-learning-lstm-model-%E0%B8%81%E0%B8%B1%E0%B8%99%E0%B8%94%E0%B8%B5%E0%B8%81%E0%B8%A7%E0%B9%88%E0%B8%B2-part1-6230eac8d1f8

https://medium.com/@newnoi/%E0%B8%AA%E0%B8%AD%E0%B8%99%E0%B8%84%E0%B8%AD%E0%B8%A1%E0%B8%9E%E0%B8%B9%E0%B8%94%E0%B9%81%E0%B8%9A%E0%B8%9A%E0%B9%84%E0%B8%97%E0%B8%A2%E0%B9%86-%E0%B8%94%E0%B9%89%E0%B8%A7%E0%B8%A2-machine-learning-model-part2-2a1609af1bd7

Machine learning model ถือเป็นองค์ความรู้ใหม่หรือไม่

เพราะความรู้เกิดขึ้นจากการทำความเข้าใจสารสนเทศที่ประมวลผลมาจากข้อมูล

ดังนั้น machine learning model ที่เป็น blackbox ที่มนุษย์ไม่สามารถทำความเข้าใจได้จึงไม่น่าเป็นองค์ความรู้ใหม่แต่เป็นผลจากการประยุกต์ใช้งาน ML algorithm มากกว่า

แต่ใน dictionary American Heritage ให้ความหมายของ knowledge ว่านอกจากจะหมายถึงความเข้าใจแล้วยังหมายถึงเพียงแค่การรู้หรือการค้นพบได้ด้วย ถ้ายึดตามความหมายหลังนี้ machine learning model ก็จัดเป็นองค์ความรู้ใหม่เช่นกัน

ส่วนคำว่า research contribution นั้นหมายถึง contribution to knowledge นั่นก็คือองค์ความรู้ใหม่นั่นเอง

วันอังคารที่ 6 สิงหาคม พ.ศ. 2567

Common problem of existing digital libraries

They have not implemented forward pointers to more advanced versions of papers, such as the journal version of a previously published conference paper.

วันเสาร์ที่ 3 สิงหาคม พ.ศ. 2567

โครงสร้าง ววน กสว

กสว รับนโยบายมาลงรายละเอียดเป็นนโยบายและแผน