How Do We Test for Artificial General Intelligence (AGI)?

This post was inspired by a recent podcast/YouTube video that I finished, where Shane Legg (Chief AGI Scientist & Co-founder of Google Deepmind) was interviewed. I'll be upfront, this post has a lot of questions more than my own answers to it and I am hoping I can reach out to more similar interest folks to have a good discussion from it. :)

If you are keen to check out the podcast, here it is:

The first section that was posted to Shane was "How Do We Measure AGI?"

The definition was briefly touched on but for researchers in this field, we all know that we have not conclusively come up with a universal definition yet that all can agreed upon.

Most of the test that are designed by human right now measures proficiencies in a certain cognitive function or professional skills & knowledge. For instance the IQ test is on (mathematical) logical thinking.

In order to test its "generality" naturally we will say that the existing model will need to be able to pass ALL tests provided, followed by doing better at it than humans to determine how "super" it is, perhaps. But here is the thing, do we have ready tests that can test all cognition faculties, which are language, perception, memory, attention, reasoning, and emotion (based on Wikipedia).

Let us take a few of these faculties to discuss.

Perception is the organization, identification, and interpretation of sensory information in order to represent and understand the presented information or environment. (Wikipedia)

Do we have a test for perception, solely? Not that I recall. We probably have proxy test for it for instance bar exams, but that using just the bar exams to determine perception can be a bit of a stretch because there are other factors involved to perform well. Are we able to design test to measure the perceptiveness of the model, followed by the individual component of perceptiveness like knowledge representation, knowledge abstraction, and understanding level.

Let's look at reasoning.

Reason is the capacity of applying logic consciously by drawing conclusions from new or existing information, with the aim of seeking the truth.

We have designed a lot of test for reasoning, because in the older days, educators see reasoning, especially mathematical reasoning, as the "symptoms" of being smart. (I guess these days people are just smart at scoring high at exams :p). Test like Chess, GMAT, IQ Test, etc are very much about reasoning, logical and mathematical, with some level of qualitative reasoning. We definitely can measure it objectively to a certain extent but if you understand how some of these test do scaling, the scaling exercise requires us to know the age or experience level of the test cohort in order to determine the score. So...there is still some subjectivity to it actually but it can be at a tolerable level.

So the biggest question we need to answer first, to determine if we do have AGI will be, can we come up with a comprehensive list of tests that can measure objectively the proficiencies of AGI. Or...if you allow me to push the envelope here a bit, perhaps just a single test that can be deployed to measure generality objectively.

Building that Barrage of Tests

Step 0 will be to set a definition of what AGI is. And for Step 1, I will list down what are all the different components of cognition. The starting list can be from Psychology and Cognitive Neuroscience.

This is then searching for existing tests that we have been using so far for human, and try to take away as much of the subjectivity as possible. Other components of cognition that does not have an existing suitable tests, we will then have to think about the design of the test, to see if we can isolate the test to the particular component or not, followed by measuring it objectively.

After that, we should have a list of tests that we can use to determine generality of the AI model but this brings another question. With this list of tests, are we able to calculate objectively generality then...again another question to ponder about!

Conclusion

To determine whether we have AGI, we have not overcome the "definition" challenge. And assuming that we do overcome it, how to measure it can be another millenium challenge on it own. What does that mean? For now, to determine whether we have an AGI on hand, it is likely we have to rely on human intuition and the more humans that is likely to stand on the side of "We have AGI." then the probability are high that we do have an AGI at hand.

What are your thoughts? I will be keen to hear and discuss more. So please go ahead to PM me on LinkedIn.

Consider supporting my content and work through "Buy Me a Coffee" which is similar to Patreon but you do not have to do frequent donation. :)