วันพฤหัสบดีที่ 10 มีนาคม พ.ศ. 2565

Google meet companion mode

สำหรับใช้โดยคนในห้องประชุมแบบ onsite ที่เข้าร่วมประชุม google meet แบบ hybrid คือมีบางคนประชุมแบบ remote ซึ่งต้องใช้ google meet ใน mode ปกติ

ทำให้ remote participants ได้ยินเสียงคนในห้องประชุมปกติ แต่คนในห้องประชุมจะไม่ได้ยินเสียงต้นที่ผ่านเข้าไปใน google meet companion mode ออกมากจากอุปกรณ์ของผู้เข้าร่วมประชุม onsite คนอื่นๆ ทำให้ไม่ echo (no voice feedback)

วันพฤหัสบดีที่ 3 มีนาคม พ.ศ. 2565

Reinforcement learning

RL learns from interaction rather than labeled data, the core idea of gradually improving performance through experience.

1. Learning Through Trial and Error

  • The agent tries actions, observes results (state transitions and rewards), and updates its knowledge or policy.

  • Over time, it learns which actions lead to better outcomes.


2. Parameter Updates

  • Just like in supervised learning, the model (e.g., Q-table, neural network) has parameters (weights).

  • During training, these parameters are updated to minimize a loss function (e.g., temporal difference error in Q-learning or prediction loss in DQNs).


3. Exploration vs. Exploitation

  • In training, the agent often explores new actions (e.g., epsilon-greedy strategy) to improve learning.

  • In the final (deployment) phase, it mainly exploits the learned policy.

====

There are many algorithms for reinforcement learning, please see https://en.wikipedia.org/wiki/Reinforcement_learning 

Well-known algorithm is Q-learning.







Reinforcement learning involves an agent, a set of states , and a set  of actions per state. By performing an action , the agent transitions from state to state. Executing an action in a specific state provides the agent with a reward (a numerical score).




--ChatGPT

Snake game:

You want the agent (snake) to learn how to survive and grow longer by playing many games.

The environment (game board) provides feedback through rewards (e.g., +1 for eating food, -1 for dying).

You want the AI to develop strategies like avoiding collisions, planning moves, or maximizing score over time.

Neural network used in DQN for Snake game:

Input: a representation of the environment’s state.

  • Grid / image input → CNN-based DQN
  • Feature vector input → MLP-based DQN

1.Grid input

Treat the snake game board as a matrix (like an image).

Input:

0 = empty cell

1 = snake body

2 = snake head

3 = food

If the board is 20×20 → the input is 20×20 matrix (sometimes flattened into 400 values).

Neural nets for this usually use CNNs (like in Atari DQN).

2.  Features Vector

Simpler and often more efficient. Common features:

[Snake head position,Food position,

Relative position of food,

Snake direction (one-hot: [up, down, left, right]),

Danger information (is there a wall or body in the next cell up/down/left/right?),

Snake length]

Output: estimated Q-values for all possible actions .

A vector of 4 Q values, each for moving up, down, left, right.