https://archive-beta.ics.uci.edu/ml/datasets
วันเสาร์ที่ 19 มีนาคม พ.ศ. 2565
วันพฤหัสบดีที่ 10 มีนาคม พ.ศ. 2565
Google meet companion mode
สำหรับใช้โดยคนในห้องประชุมแบบ onsite ที่เข้าร่วมประชุม google meet แบบ hybrid คือมีบางคนประชุมแบบ remote ซึ่งต้องใช้ google meet ใน mode ปกติ
ทำให้ remote participants ได้ยินเสียงคนในห้องประชุมปกติ แต่คนในห้องประชุมจะไม่ได้ยินเสียงต้นที่ผ่านเข้าไปใน google meet companion mode ออกมากจากอุปกรณ์ของผู้เข้าร่วมประชุม onsite คนอื่นๆ ทำให้ไม่ echo (no voice feedback)
วันพฤหัสบดีที่ 3 มีนาคม พ.ศ. 2565
Reinforcement learning
There are many algorithms for reinforcement learning, please see https://en.wikipedia.org/wiki/Reinforcement_learning
Well-known algorithm is Q-learning.
Reinforcement learning involves an agent, a set of states , and a set of actions per state. By performing an action , the agent transitions from state to state. Executing an action in a specific state provides the agent with a reward (a numerical score).
Algorithm
After steps into the future the agent will decide some next step. The weight for this step is calculated as , where (the discount factor) is a number between 0 and 1 () and has the effect of valuing rewards received earlier higher than those received later (reflecting the value of a "good start"). may also be interpreted as the probability to succeed (or survive) at every step .
The algorithm, therefore, has a function that calculates the quality of a state–action combination:
- .
Before learning begins, is initialized to a possibly arbitrary fixed value (chosen by the programmer). Then, at each time the agent selects an action , observes a reward , enters a new state (that may depend on both the previous state and the selected action), and is updated. The core of the algorithm is a Bellman equation as a simple value iteration update, using the weighted average of the current value and the new information.
Cf. https://en.wikipedia.org/wiki/Q-learning#Deep_Q-learning
ChatGPT:
Q learning uses Q table to store Q values, representing qualities of rewards the agent can achieve at state s when taking action a. Q table's row represents states and column represents actions; each data item represents Q value.
Deep Q-Learning (DQL) uses neural network whose output is an approximated current Q-values instead of using Tabular (Q-table for discrete states/actions) to store current Q-values as in Q learning. The neural network is trained to reduce loss function value between target Q value (derived from Bellman equation) and current Q value. In Q-Learning, Bellman equation is directly applied to update the Q-values stored in a table for each state-action pair.