Dr.Jiw: กุมภาพันธ์ 2024

วันศุกร์ที่ 23 กุมภาพันธ์ พ.ศ. 2567

Why does some field has lots of publications a year?

Because their contributions (e.g. in bioscience field) come from new data sets and analyzed by existing methods. Unlike computer science field, the contributions come from new algorithms and proved by existing data sets.

วันพฤหัสบดีที่ 22 กุมภาพันธ์ พ.ศ. 2567

Ensemble learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Example in CNN : https://towardsdatascience.com/ensembling-convnets-using-keras-237d429157eb

Ensemble architecture:

https://towardsdatascience.com/ensemble-learning-stacking-blending-voting-b37737c4f483

Decision functions:

https://towardsdatascience.com/practical-guide-to-ensemble-learning-d34c74e022a0

The key components of ensemble learning include:

Base Learners (Base Models): These are the individual models that comprise the ensemble. They can be of any type, such as decision trees, neural networks, support vector machines, or any other machine learning algorithm.
Ensemble Methods: These are the techniques used to combine the predictions of the base learners. Some common ensemble methods include:
- Voting: Combining predictions by majority voting (for classification) or averaging (for regression).
- Bagging (Bootstrap Aggregating): Training multiple base learners on different subsets of the training data, usually sampled with replacement, and then combining their predictions.
- Boosting: Building a sequence of base learners where each subsequent learner focuses on the examples that previous learners found difficult, giving higher weight to misclassified instances.
- Stacking: Training a meta-model (or blender) on the predictions of multiple base learners to make the final prediction.
Diversity: Ensuring that the base learners are diverse, meaning they make different types of errors on the data. This diversity is crucial for the ensemble to outperform individual models. It can be achieved through using different algorithms, different subsets of the data, or different hyperparameters.
Aggregation Strategy: This determines how the predictions of the base learners are combined to produce the final output. Common aggregation strategies include averaging, weighted averaging, or selecting the most frequent prediction.

   Majority Voting: For classification tasks, each base learner's prediction is considered as a "vote," and the final prediction is determined by the majority of votes. This is particularly effective when the base learners have similar performance.
   Weighted Voting: Each base learner's prediction is weighted based on its confidence or performance, and the final prediction is a weighted sum or average of these predictions.
   Averaging:
Simple Average: The predictions of all base learners are averaged to produce the final prediction. This is commonly used in regression tasks.
   Weighted Average: Similar to weighted voting, but the weights are assigned based on the performance or confidence of each base learner.
   Stacking (Meta-Learning):
Base learners' predictions are used as features to train a higher-level model (meta-model or blender). The meta-model learns how to best combine the predictions of base learners to make the final prediction. This approach can capture more complex relationships between the base learners' predictions.
   Bagging (Bootstrap Aggregating):

Base learners are trained on different subsets of the training data, typically sampled with replacement. The final prediction is often the average (for regression) or majority vote (for classification) of the predictions of all base learners. Random Forest is a popular example of a bagging ensemble method using decision trees as base learners.
Boosting:
Base learners are trained sequentially, with each subsequent learner focusing on the examples that previous learners found difficult. The final prediction is a weighted sum of the predictions of all base learners. Gradient Boosting Machines (GBMs), AdaBoost, and XGBoost are examples of boosting algorithms.
Rank Aggregation:
In tasks such as recommender systems or search engines, where the goal is to rank items, rank aggregation methods are used to combine the rankings produced by different algorithms into a single ranking that best represents the preferences of the users.

Evaluation Metric: The metric used to evaluate the performance of the ensemble. Depending on the task (classification, regression, etc.), different evaluation metrics such as accuracy, precision, recall, F1-score, mean squared error (MSE), etc., can be used.

Hyperparameters: Ensemble methods often have hyperparameters that need to be tuned for optimal performance. These may include the number of base learners, learning rates (for boosting algorithms), maximum tree depth (for decision tree-based methods), etc.

วันศุกร์ที่ 16 กุมภาพันธ์ พ.ศ. 2567

Arduino Board VS ESP32 VS Node MCU VS Raspberry Pi

https://v89infinity.com/%E0%B8%84%E0%B8%A7%E0%B8%B2%E0%B8%A1%E0%B9%81%E0%B8%95%E0%B8%81%E0%B8%95%E0%B9%88%E0%B8%B2%E0%B8%87%E0%B8%A3%E0%B8%B0%E0%B8%AB%E0%B8%A7%E0%B9%88%E0%B8%B2%E0%B8%87-arduino-board-vs-node-mcu-vs-raspberr/

All are microcontroller boards except Nodemcu that is an open-source firmware and development kit based on esp8266&32.

วันพฤหัสบดีที่ 15 กุมภาพันธ์ พ.ศ. 2567

7billion parameter language model

https://www.scb10x.com/blog/typhoon-innovative-thai-language-model?fbclid=IwAR3MrkVOJ2VqDpds7OKY58X6v0B71ogf9mWMCOu4Azj8Ch0wm5eyxERmE1A

ต่อยอดมาจาก Mistral 7B

https://mistral.ai/news/announcing-mistral-7b/

วันพุธที่ 14 กุมภาพันธ์ พ.ศ. 2567

Programming language popularity ranking

https://spectrum.ieee.org/the-top-programming-languages-2023

Methodology of survey

https://spectrum.ieee.org/top-programming-languages-methodology

วันอังคารที่ 13 กุมภาพันธ์ พ.ศ. 2567

Astrophotography

Planetary imaging vs deep sky imaging vs skyscape photography e.g. milky way

hallucination

noun: (artificial intelligence) A confident but incorrect response given by an artificial intelligence.

วันจันทร์ที่ 12 กุมภาพันธ์ พ.ศ. 2567

Topic modeling

https://towardsdatascience.com/semantic-signal-separation-769f43b46779

Topic modeling is a type of unsupervised machine learning technique used to identify abstract topics within a collection of documents or textual data. It is widely used in natural language processing (NLP) to automatically organize, summarize, and understand large text datasets.

Key Concepts in Topic Modeling

Topic: A collection of words that frequently appear together and represent a theme or subject in the data.
Document: A single piece of textual data, such as an article, paragraph, or tweet.
Corpus: The entire collection of documents being analyzed.

How Topic Modeling Works

Topic modeling algorithms aim to:

Discover latent topics within a corpus. (Latent topics are not directly observed or measured but is inferred from the observable data)
Assign a distribution of topics to each document.
Assign a distribution of words to each topic.

For example, in a collection of news articles:

Topic A may include words like "politics," "election," and "government."
Topic B may include words like "sports," "game," and "team."

Each document might then be associated with these topics to varying degrees.

Applications of Topic Modeling

Content Summarization: Summarizing large text datasets by identifying major themes.
Information Retrieval: Improving search engines and recommendation systems.
Sentiment Analysis: Understanding the underlying topics associated with positive or negative sentiments.
Trend Analysis: Identifying trends in news articles, social media posts, or research papers.
Customer Feedback: Analyzing reviews to discover recurring themes.

Steps for Topic Modeling

Preprocess Text:
- Tokenize text.
- Remove stopwords, punctuation, and special characters.
- Perform stemming or lemmatization.
Build Document-Term Matrix (DTM):
- Represent text data as a matrix of documents and their term frequencies.
Apply Topic Modeling Algorithm:
- Use LDA, NMF, or other methods to extract topics.
Interpret Topics:
- Analyze the words associated with each topic to label them meaningfully.

--ChatGPT

Here's a breakdown of typical LDA outputs:

Per-Topic Word Distributions (Φ):
LDA identifies topics as distributions over words, meaning it assigns probabilities (or weights) to words belonging to each topic.
Example: Topic 1 might be characterized by words like "football," "soccer," "goal," and "match" with associated probabilities, while Topic 2 might be characterized by words like "finance," "economy," "market," and "investment".
Per-Document Topic Proportions (θ):
LDA also determines the proportion of each topic present in each document.
Example: Document 1 might be 80% about Topic 1 (football) and 20% about Topic 2 (finance), while Document 2 might be 60% about Topic 2 and 40% about Topic 1.

--Gemini

https://awirote.medium.com/%E0%B8%81%E0%B8%B2%E0%B8%A3%E0%B8%97%E0%B8%B3-topic-modeling-d1ac4d2c3287

วันพฤหัสบดีที่ 8 กุมภาพันธ์ พ.ศ. 2567

Mean mode mediam norm

1. ค่าเฉลี่ย (Mean or average) คือ ผลรวมทั้งหมดหารด้วยจำนวนข้อมูล

2. มัธยฐาน (Median) คือ การนำข้อมูลมาเรียงจากน้อยไปมากและเลือกเอาค่าข้อมูลที่อยู่ตรงกลางแถว

3. ฐานนิยม (Mode) คือ ค่าข้อมูลที่ซ้ำกันมากที่สุด

Norm เป็นค่ากลางที่เราเลือกใช้เพื่อเปรียบเทียบกับคะแนนของผู้สอบ Norm จะเป็นค่าอะไรก็ได้ที่เราคิดว่าเหมาะสมที่สุด อาทิ ค่าเฉลี่ย คือ เอาคะแนนของทุกคนมารวมกันแล้วหารด้วยจำนวนผู้สอบ ค่ามัธยฐาน คือ เอาคะแนนของทุกคนมาเรียงกันจากน้อยไปมาก หรือจากมากไปน้อย แล้วใช้คะแนนของตำแหน่งที่อยู่ตรงกลาง