Dr.Jiw: มีนาคม 2022

วันเสาร์ที่ 19 มีนาคม พ.ศ. 2565

Supplemental info in open access journal e.g.Peerj allows public accesses to paper data sets and codes

Data sources: https://www.reddit.com/subreddits/ e.g. r/Mental Health

วันพฤหัสบดีที่ 10 มีนาคม พ.ศ. 2565

Google meet companion mode

สำหรับใช้โดยคนในห้องประชุมแบบ onsite ที่เข้าร่วมประชุม google meet แบบ hybrid คือมีบางคนประชุมแบบ remote ซึ่งต้องใช้ google meet ใน mode ปกติ

ทำให้ remote participants ได้ยินเสียงคนในห้องประชุมปกติ แต่คนในห้องประชุมจะไม่ได้ยินเสียงต้นที่ผ่านเข้าไปใน google meet companion mode ออกมากจากอุปกรณ์ของผู้เข้าร่วมประชุม onsite คนอื่นๆ ทำให้ไม่ echo (no voice feedback)

วันพฤหัสบดีที่ 3 มีนาคม พ.ศ. 2565

Reinforcement learning

RL learns from interaction rather than labeled data, the core idea of gradually improving performance through experience.

1. Learning Through Trial and Error

The agent tries actions, observes results (state transitions and rewards), and updates its knowledge or policy.
Over time, it learns which actions lead to better outcomes.

2. Parameter Updates

Just like in supervised learning, the model (e.g., Q-table, neural network) has parameters (weights).
During training, these parameters are updated to minimize a loss function (e.g., temporal difference error in Q-learning or prediction loss in DQNs).

3. Exploration vs. Exploitation

In training, the agent often explores new actions (e.g., epsilon-greedy strategy) to improve learning.
In the final (deployment) phase, it mainly exploits the learned policy.

====

There are many algorithms for reinforcement learning, please see https://en.wikipedia.org/wiki/Reinforcement_learning

Well-known algorithm is Q-learning.

Reinforcement learning involves an agent, a set of states $�$ , and a set $�$ of actions per state. By performing an action $a\in A$ , the agent transitions from state to state. Executing an action in a specific state provides the agent with a reward (a numerical score).

--ChatGPT

Snake game:

You want the agent (snake) to learn how to survive and grow longer by playing many games.

The environment (game board) provides feedback through rewards (e.g., +1 for eating food, -1 for dying).

You want the AI to develop strategies like avoiding collisions, planning moves, or maximizing score over time.

Neural network used in DQN for Snake game:

Input: a representation of the environment’s state.

Grid / image input → CNN-based DQN
Feature vector input → MLP-based DQN

1.Grid input

Treat the snake game board as a matrix (like an image).

Input:

0 = empty cell

1 = snake body

2 = snake head

3 = food

If the board is 20×20 → the input is 20×20 matrix (sometimes flattened into 400 values).

Neural nets for this usually use CNNs (like in Atari DQN).

2. Features Vector

Simpler and often more efficient. Common features:

[Snake head position,Food position,

Relative position of food,

Snake direction (one-hot: [up, down, left, right]),

Danger information (is there a wall or body in the next cell up/down/left/right?),

Snake length]

Output: estimated Q-values for all possible actions .

A vector of 4 Q values, each for moving up, down, left, right.