A nascent idea about learning features for more general intelligence.

elpidiovaldez5 · « **on:** March 02, 2019, 05:16:21 am »

I have been thinking a lot about the strengths and limitations of Deep Reinforcement Learning (RL). It led me to an idea that I'd like to discuss.

RL uses theory based on the Bellman equation to assign appropriate credit to actions which were taken some time before reward is received. This theory leads to gradually improving estimates of the value of an action in a given situation, and a DNN is trained to generate the (approximate) desired values, for each action, given the perceptual input. The outcome is that the DNN learns features in the perceptual input, which are relevant to maximising reward.

A problem with RL is that the rewards may be very infrequent. This makes it hard to identify which actions actually helped get the reward. The algorithm figures out which actions contributed in an essentially statistical way, so infrequent rewards mean a LOT of trials are needed. Often auxiliary tasks are added in order to generate denser rewards. These can help to learn features faster, and so well chosen auxiliary tasks speed up learning of the main task. Learning multiple value functions drives feature learning very well.

Although RL allows an agent to accomplish a task, it is very data hungry and only learns a very narrow, task specific model of the world. This led me to wonder if there are any alternative paradigms to RL that could drive feature learning and perhaps yield better understanding of the world. An idea that occurred to me was that learning a policy that could simulate/predict the world might be desirable. It needs a scheme analogous to RL which continuously refines a function that can be approximated by a DNN. For example after new experiences are observed in the world, the system runs the simulator policy and compares the results with what really happened. The errors are identified, the model is updated and the DNN is retrained. Over time the simulator becomes better at correctly predicting the world. It can be continuously checked against past experience and honed to fit as closely as possible.

Learning a simulator can drive feature learning like RL, however it also produces a simulator with a well measured accuracy. This simulator is essentially a model of the world and a simulator with a high accuracy could be used to train a new task using RL (i.e. it allows model based RL). The simulator represents deeper understanding, not just of a single task, but of how the world behaves. As such the information can be used to enable many different tasks to be performed. A lot of reasoning can be done by running a simulator to see what is likely to happen. Josh Tenenbaum believes that humans often run mental simulations as part of their reasoning.

So how should the simulator be built ? I imagine a 'policy' that generates the most likely events given the perceptual input. The events are used to update the perceptual input in the predicted way and the simulator is run forward a few steps and then compared to reality. The accuracy must be measured and then credit needs to be assigned to the predicted events. This is very far from a workable design. How can the events in the event space be determined ? It seems very open ended. Is it better to predict at pixel level ? What parameters should control the simulator ? Does model based RL (like Dyna) already achieve the same goals ? Any ideas ?

HS · « **Reply #1 on:** March 02, 2019, 07:09:41 am »

Seems like a catch 22 if you are looking to simulate intelligent life forms. The laws of physics, gravity, momentum, trajectories, etc, might be a place to start. Pattern recognition (eg, day/night) and incorporation into the simulation, maybe?

Korrelan · « **Reply #2 on:** March 02, 2019, 12:29:44 pm »

With reward based approaches you always have the problems of what the reward should be and when it should be given. Any task can be sub divided into ever smaller tasks, what are the task boundaries?

Perhaps start with something simple like catching a ball on a random trajectory in a cup, or balancing a vertical stick, etc.

Easy simulations to build and it will give you a starting point and insights into the problem space without going full blown planet simulation.

AndyGoode · « **Reply #3 on:** March 02, 2019, 11:43:58 pm »

I said it before in another post but my advice was ignored, so I'll say it again: If you first solve the problem of commonsense reasoning then everything else will fall into place. Neural networks as they are currently being investigated will not produce AGI. (Read "On Intelligence" by Richard Hawkins.)

A nascent idea about learning features for more general intelligence.

elpidiovaldez5

A nascent idea about learning features for more general intelligence.

HS

Re: A nascent idea about learning features for more general intelligence.

Korrelan

Re: A nascent idea about learning features for more general intelligence.

AndyGoode

Re: A nascent idea about learning features for more general intelligence.

Recent Topics

Recent News

Users Online

Articles