There has been a lot of development in Reinforcement Learning(RL), thanks to AlphaGo, and AlphaStar. Through the development of the two, we have seen more research funding poured into it. If you are looking for a map/landscape of Reinforcement Learning, check out the following site by Louis Kirch.

Map of Reinforcement Learning

And if you are looking for more materials on Reinforcement Learning, might I interest you my fellow Co-founder, Thia Kai Xin's Reinforcement Learning workshop, conducted during AI Professionals Association Developer Conference, (7th Jan 2022). Slides are below.

Thia Kai Xin Reinforcement Learning Workshop's Slides

Coming back! So many folks have this assumption that RL could be a route to Artificial General Intelligence (AGI). I have not doubt that RL can help us move a lot more towards AGI but there is a critical downside to it. Let me explain here. :)

What is Reinforcement Learning (RL)?

To put it in a nutshell, RL is we give an agent (digital or physical) an objective function. The agent will interact with the environment, through explore and exploit to understand what moves/behavior to undertaken. Through interacting with the environment, taking the right move means the agent is closer to the objective and thus will be incentivise to make more of it, similar to the opposite where it takes the "bad" move and it moves further away from the objective. The "reward" from the environment "reinforce" the "good" and "bad" moves.

There are a few components that an RL agent will have to work with. The major ones are:

Policy - The agent's behavior

Value Function - Prediction of Future Rewards, to evaluate the goodness/badness of each state

Model of Environment - Prediction of what comes next in the step.

Note: If you are interested to learn more about Reinforcement Learning, starting from bottom to the top, I strongly recommends, David Silver's "Introduction to Reinforcement Learning" class. Here is the YouTube Playlist - "RL Class"

Reinforcement Learning development has come a long way, being now used for multi-agent interaction, seen below where agents played "Hide & Seek"

And also for policy experimentation.

One will definitely be amazed at what we can achieved through RL.

So...why not for AGI?

If you look at Reinforcement Learning, any success of it comes from being able to define the 3 components correctly, policy, value and environment model. Currently now, the agent can only act AFTER humans have defined the THREE components at least to start.

Think about it, for humans, we can come up with our policy, value and understanding of the environment. And we learn as we go along, as compared to these use cases where there is a defined deadline. So unless we can design agents that can create and update their policy, value and environment model, they are pretty much "narrow" in their behavior.

This is the reason why I felt, RL can be part of AGI but it will not lead to AGI. If you think through this path, logically, a scenario where digital agents augment the human's capability is more likely. It also ties in with what Stuart Russell proposed in "Human Compatible" is a more likely scenario as well, where the humans "teach" the agent, a process that helps the agent to understand the defined (but difficult to articulate) policy and value, creating the same scenario stated at the start of the paragraph.

So...what are your thoughts? Will love to hear from you! You can share them with me on my LinkedIn. Please feel free to link up on LinkedIn or Twitter (@PSkoo). Do consider signing up for my newsletter too. Have a great 2022!