Reward is enough - Journal of Artificial Intelligence

in Steem Links3 years ago

( May 24, 2021; Journal of Artificial Intelligence )

Abstract

In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.

Read the rest from Journal of Artificial Intelligence: Reward is enough


-h/t Communications of the ACM

Sort:  
 3 years ago 

Umm .. Interesting articulates they are practically looking for a way for artificial intelligence to have its own thoughts that can manipulate any object, remember where it has left something, know how to choose between good and bad.

Having facial expressions depending on how you feel Umm I think these would be a double-edged sword in the future as we have talked about before.

Do you think that an artificial intelligence can develop some kind of sensation through reward stimulation?

 3 years ago 

Do you think that an artificial intelligence can develop some kind of sensation through reward stimulation?

Yeah, I do think so. I remember reading about this last year.

The AI community has a long-term goal of building intelligent machines that interact effectively with the physical world, and a key challenge is teaching these systems to navigate through complex, unfamiliar real-world environments to reach a specified destination — without a preprovided map. We are announcing today that Facebook AI has created a new large-scale distributed reinforcement learning (RL) algorithm called DD-PPO, which has effectively solved the task of point-goal navigation using only an RGB-D camera, GPS, and compass data. Agents trained with DD-PPO (which stands for decentralized distributed proximal policy optimization) achieve nearly 100 percent success in a variety of virtual environments, such as houses and office buildings. We have also successfully tested our model with tasks in real-world physical settings using a LoCoBot and Facebook AI’s PyRobot platform.

When they talk about "reinforcement learning", that's a reward-based learning model.

 3 years ago 

To understand the research paper, I found this YouTube video by Yannic Kilcher, a Machine Learning expert.

Very cool video! So far, I have only had time to skim the article, but I hope to read it later. Meanwhile, I am listening to the video now. He doesn't just explain it, but also presents some counterarguments and criticisms. Thank you very much.

Sophisticated abilities may arise from the maximisation of simple rewards in complex environments.

There's a You're Wrong About podcast episode that essentially makes the case that in order to communicate with sign language, Koko the gorilla had simply learned gestures for rewards with a more sophisticated framework than what we typically see in research. Obviously, that's an oversimplification, but there's a wonderful debate about the whole thing.

According to our hypothesis, the ability of language in its full richness, including all of these broader abilities, arises from the pursuit of reward. It is an instance of an agent's ability to produce complex sequences of actions (e.g. uttering sentences) based on complex sequences of observations (e.g. receiving sentences) in order to influence other agents in the environment (cf. discussion of social intelligence above) and accumulate greater reward [7].

If reward is enough, and seeking reward is a singular universal mechanism for the development of intelligence, it would seem that either side of the Koko argument is moot. The "Koko was only responding to rewards" camp is in fact just echoing the sentiment that Koko is demonstrating general intelligence. Therefore, that cannot by itself stand as an argument that Koko had not demonstrated general intelligence. In contrast, arguing that Koko did demonstrate a high level of intelligence would simply be reiterating the counter argument that Koko manifested sophisticated abilities through the maximization of rewards in complex environments.

 3 years ago 

If reward is enough, and seeking reward is a singular universal mechanism for the development of intelligence, it would seem that either side of the Koko argument is moot.

Very interesting point. And this mirrors the question of free will and whether human intelligence is really anything more than just a biological form of computation -- i.e. Chalmers' hard problem of consciousness.

Coin Marketplace

STEEM 0.20
TRX 0.13
JST 0.029
BTC 61428.91
ETH 3382.72
USDT 1.00
SBD 2.50