I am studying Reinforcement Learning and truly fascinated by how an agent learns.
Basics first :) It is better to have an understanding of the terminology “Agent”, “Environment”, “State”, “Action”. They are from our daily lives.
We are the agents, our kid is an agent, our hoover robot is an agent, a trading bot is an agent, an autonomous car is an agent. Basically, anything interacting with its environment by taking action is an agent. Now comes the second terminology environment. It is a broad term. For a chess player, an environment is a chessboard. Because the chess player, whether human or a bot, needs to understand the positions and decide on the move. The game is all about positions, moves on the chessboard. For us, the home we live in, the car we drive, the desk we work on, the single document we work on it an environment individually or collectively. The action obviously is a choice. Blue pill or red pill.
Any decision is an action. The agent assesses how good the action is by observing the change in the environment. When assessing it does not only think about how good the action is at a certain moment but it also thinks about how this action can lead to a good result in the future. The result of taking an action is called a reward. So the agent wants to maximize the reward or future reward when taking actions. And lastly, if the agent takes the action and moves, the state of the agent changes.
Reinforcement Learning is all about observing the environment, taking an action and trying to maximize the reward.
The magical part of reinforcement learning is that it does not need a supervisor/teacher. It only needs interactions and by doing so, it generates its own data. Let's imagine a chess-playing agent. If it opens the game by playing pawn to F3 and at the end, it loses, then it can attribute negative values/rewards to each action taken at a certain board position(state) from the last to the beginning. The agent plays this game n times(eg. 10,000), and every new game it remembers the past if the move is good or bad. So by remembering those past moves, actually the agent follows a strategy which is called policy.
Reinforcement learning algorithms are trying to understand/learn the values of the states, policy, or the model of the environment.
- Value-Based: This approach is trying to learn the state or state-action values. It is trying to choose the best action at a given state. To do so, the agent needs to explore the environment(state-space).
- Policy-Based: This approach is trying to learn directly the stochastic policy function that maps state to action. It samples the policy and then observes and trying to optimize the policy.
- Model-Based: This approach is trying to learn the model of the environment and its dynamics, then plan using the model. It observes the environment and updates its model and re-plans.
The environment or more specifically the state space is important. It can be discrete(like a chess-playing agent) or continuous(like an autonomous car), fully observable(chessboard), or partially observable( the highway). If the state space is getting bigger then the values of action on a state may explode, instead, an approximation is applied. Therefore deep learning is applied in the Reinforcement problem because DL is a good function approximator.
If you don’t see the documentary about AlghaGO, I strongly recommend checking this. This documentary is about an RL agent that played against GO-master LEE SEDOL who is the winner of 18 world GO titles.
“I thought AlphaGo was based on probability calculation and that it was merely a machine. But when I saw this move, I changed my mind. Surely, AlphaGo is creative”. LEE SEDOL
So far we have discussed only one agent, but in real life, there are multiple agents in the environment, they are interacting not only with the environment but also with each other, in fact, they are also part of the environment.
Here is a good example of how multi-agents learn together.
I am fascinated by reinforcement learning because it is quite similar to how humans learn by interacting with their environment. It tries and fails many times and it improves gradually. The advantage of RL is that learning iteration is quite faster than human learning cycles and there is no risk of losing the agents in the digital world :)
I think Reinforcement Learning is the path to artificial general intelligence.
- Sutton, R. & Barto, A., 2018. Reinforcement Learning. 2nd Edition ed. s.l.: The MIT Press.
- Deepmind’s AlphaGO, https://deepmind.com/research/case-studies/alphago-the-story-so-far
- Open AI, Emergent Tool Use from Multi-Agent Interaction, https://openai.com/blog/emergent-tool-use/
- Mnih, V. et al., 2013. Playing Atari with Deep Reinforcement Learning.
- SALLOUM, Z., 2019. Policy Gradient Step by Step.
Available at: https://towardsdatascience.com/policy-gradient-step-by-step-ac34b629fd55
- Yoon, C., 2019. Understanding Actor-Critic Methods and A2C.
Available at: https://towardsdatascience.com/understanding-actor-critic-methods- 931b97b6df3f