The Inverse Relation Between Prior Information And Intelligent Behaviour
Intelligence As Plus And Minus
I decided to write a blog post on the inverse relation between how much we as humans try to design our system, and how intelligent it will eventually behave. The peculiar thing about (re-)building articificial intelligence (AI) is that the harder we try to put in our own knowledge, the less intelligent and adaptive it will behave. This appears as a paradox, but I would argue it happens because we as humans are not aware of our own operational kwowledge.
You may object to this at first, as you think you have a pretty solid idea of what you’re doing. For example, if I show you a picture and ask you whether there is a chair in there, you won’t be challenged much. However, if I ask you what really caused this to be a chair, it already becomes harder. And if I just give you the raw zeros and ones of the picture (which is basically what your eye gets, and definitely what a robot camera gets), you will have a very hard time. How do all these activations add up to chair or no chair. As a second example, I could ask you to pick up a bottle in front of you. Piece of cake right. However, now I ask you which forces (or motor voltages in a robot) you put on each of your muscles at each timestep in this sequence. Highly non-trivial anymore. The bottom line here is: we can do things, but we really don’t really understand our operational knowledge at all.
Symbolic AI So how is this reflected in the history of AI? Initial AI research was mostly symbolic, in a sense that researchers tried to understand the human mind and then attempted to rebuild it. As you might figure from the above, this only worked well up to a certain level of complexity. The research questions of symbolic AI get much closer to cognitive psychology these days. Their focus is on actually understanding what intelligence is, and only then built it.
Supervised Learning However, there were obvious tasks that could not be solved in this way, such as ‘recognizing the chair in a picture’ (as described above, a typical computer vision problem). The solution to this challenge turned out to be machine learning, in particular supervised machine learning. In machine learning, we try to learn the intelligent behaviour from data. For supervised learning, we require a dataset of examples $X$, for example pictures, and associated labels $Y$, for example chair or no chair. We then specify a model connecting $X$ to $Y$, and run an optimization to estimate the parameters of this model. It’s like a (logistic) regression analysis with large data. The key benefit is that after training, the model (for example a deep neural network) is able to predict whether a new picture will contain a chair or not. We don’t really understand how the large network managed to do this (much like we don’t understand how our own visual cortex does this). There is way to many interactions happening for us to grasp, but: it works.
Reinforcement learning Much of the success in machine learning is due to supervised learning, but it will not be enough to build true artificial intelligence. The problem is the requirement for the correct labels (supervision). Due to the internet, we managed to get larger datasets of annotated pictures or translated speech, because these are tasks for which we can at least generate the label (although we don’t understand the inner workings). However, things become much harder when we actually want to act in the world. For example, if you would want to learn how to grasp a bottle from supervised learning, you would need to provide the labels as to which motor voltage to apply at each timepoint. But you really don’t know those labels. This is where reinforcement learning (RL) comes in. In RL, we rather specify a (sparse) reward/goal function, and let the system figure out the solution itself. Notice what we’re effectively doing here: again, we remove part of the human prior (we remove the labels), and provide even less information (only the goal). However, these systems manage to find solutions where supervised learning fails.
Evolution and optimization I would argue that reinforcement learning is based on two principles. The first is the principle of ‘plus and minus’. I would like to assume as little as possible about what is important in our world, but the one thing I would be certain about is that there exist something as plus (do this more) and minus (do this less). The second principle of RL is association: we try to remember where we experience plus and minus, so that we can act better next time. After that, I think intelligence can be seen as optimizing the plusses. Now a large part of our intelligence is already there at birth. Your brain is already highly initialized, which is a product of millions of years of evolutionary optimization (computation). Evolution is the mother of plus and minus, without the association or memory. Evolution does not remember which particular mutations it already tried, and what it should and should not do anymore. It simply moves forward with the stuff that works well (plus). And there we are. Although we are a local optimum, I would argue that everything we are can be reduced to the optimization of one concept: whether our children get children. This must be the ultimately sparse reward function of the universe, as it is the only thing that could cause you to be here.
Learning to Learn Now, if intelligence is really about black-box optimizing against a super sparse reward function, why haven’t we done so yet. Well, the problem is that the search space (or the exploration) problem is still much to large, and the computational time too high. We are getting closer to solving the supervised learning problem, and starting to get somewhere on the next step of reinforcement learning. However, in current RL research, we are still manually designing reward functions per domain, and still specifying the parametric structure of our model. Although this is inevitable on the way forward, I think it is still too much prior information. We need to go one step higher, which is an emerging machine learning topic known as ‘learning to learn’. In traditional machine learning, we first specify a model (ourselves), which we then optimize against some criterion. However, this dichotomy forces us to put prior information in the model specification, which we should avoid altogether (once again, we don’t understand how we work). In learning to learn, we try to optimize a system that is able to learn a model and optimize it, together. Then, you are only left with defining the plus, and the rest is optimization. This is what evolution did for us.