DeepMind’s ‘Gato’ is mediocre, so why did they build it?


DeepMind’s “Gato” neural network excels at a variety of tasks including manipulating a robotic arm that stacks blocks, playing the Atari 2600 game, and generating captioned images.


The world is used to seeing headlines about the latest breakthrough in deep learning forms of artificial intelligence. However, the latest achievement of Google’s DeepMind division can be summed up as “An AI program that does a lot of things”.

Gato, as DeepMind’s program is called, was announced this week as a program called multimode, a program that can play video games, chat, write layouts, caption photos, and control the robotic arm stacking block. It is a neural network that can work with many types of data to perform many types of tasks.

Lead author Scott Reed writes: “With a single set of weights, Gato can engage in dialogues, annotate images, stack blocks with a real robotic arm, outperforming humans at play. Atari games, navigate in simulated 3D environments, follow instructions and more” and colleagues in their paper, “A Generalist Agent”, posted on the Arxiv preprint server.

DeepMind co-founder Demis Hassabis cheered the team on, exclaimed in a tweet“Our best general agent yet!! Great work by the team!”

Also: A new test: Does AI really know cats or dogs – or whatever?

The only point is that Gato doesn’t really excel in some quests.

On the one hand, the program can do a better job than a dedicated machine learning program at controlling the Sawyer robotic arm that stacks blocks. On the other hand, it generates captions for images which in many cases are quite poor. Its ability to have a standard conversation with a human interlocutor is also mediocre, sometimes eliciting contradictory and meaningless utterances.

And its Atari 2600 video game playability is lower than most dedicated ML programs designed to compete in the standard. Arcade learning environment.

Why would you create a program that works pretty well while something else doesn’t? According to the authors, premises and expectations.

There is precedent for more general types of programs becoming modern in AI, and it is expected that increasing amounts of computing power in the future will make up for the shortfalls.

Generality can tend to win in AI. As the authors note, citing AI scholar Richard Sutton, “Historically, generic models that are better at leveraging computation also tend to outperform domain-specific approaches. than last.”

As Sutton wrote in his own blog post“The biggest lesson that can be read from 70 years of AI research is that general methods of leveraging computation are ultimately the most efficient and profitable.”

Putting in a formal argument, Reed and team write that “we are here to test the hypothesis that it is possible to train a general agent capable of performing a large number of tasks; and that the total agent is This logic can be adjusted with little additional data to succeed at an even larger number of tasks.”

Also: Shining Meta AI LeCun explores the energy frontiers of deep learning

The model is, in this case, really, very generic. This is a version of Transformer, the type of dominant attention-based model that has become the basis of many shows including GPT-3. A transformer models the probability of some element for the elements surrounding it, such as words in a sentence.

In the case of Gato, DeepMind scientists can use the same conditional probabilistic search across multiple data types.

When Reed and his colleagues described the task of training Gato,

During Gato’s training phase, data from various tasks and methods is serialized into a flat, batched sequence of tokens and processed by a transformer neural network similar to a model. large linguistic form. The loss is masked so that Gato only predicts text and action targets.

In other words, Gato does not treat tokens differently whether they are words in a chat or motion vectors in a block stacking exercise. All the same.


Gato training scenario.

Reed et al. 2022

Being buried in Reed and the team’s hypothesis is an inevitable consequence, namely that in the end more computing power will prevail. Currently, Gato is limited by the response time of the Sawyer robotic arm performing the stacking. With 1.18 billion lattice parameters, Gato is a lot smaller than very large AI models like GPT-3. As deep learning models grow larger, performing inference leads to lag that can fail in the unknown world of a real-world robot.

However, Reed and colleagues hope that limit will be overcome as AI hardware processes faster.

“We focused training at the operating point of the model scale that allows real-time control of real-world robots, which currently have parameters around 1.2B in the case of Gato,” they write. “As the modeling hardware and architecture improves, this operating point will naturally increase the size of the possible model, pushing general models higher in the scaling curve.”

Thus, Gato is really a model for how computer scale will continue to be the main vector of machine learning evolution, by making general models larger and larger. In other words, bigger is better.


Gato gets better as the size of the neural network in the parameters increases.

Reed et al. 2022

And the authors have some evidence for this. Gato seems to get better as it gets bigger. They compare average scores across all benchmarking tasks for three sizes of model by spec, 79 million, 364 million, and main model, 1.18 billion. “We can see that for the equivalent number of tokens, there is a significant performance improvement with increased scaling,” the authors write.

An interesting future question is whether a program in general is more dangerous than other types of AI programs. The authors spend a lot of time in the article discussing the fact that there are potential dangers that are not well understood.

The idea of ​​a multi-tasking program suggests to people a kind of human adaptability, but that can be a dangerous misconception. “For example, physical phenomena can lead to the user anthropomorphizing the agent, leading to misplaced trust in the event of a system malfunction or being exploited by bad actors,” Reed and team wrote.

“Also, while knowledge transfer between multiple domains is often a goal in ML research, it can produce unexpected and undesirable results if certain behaviors (e.g. fighting video games) was moved to the wrong context.”

Therefore, they write, “Ethical and safety considerations of knowledge transfer may require substantial new research as generalized systems evolve.”

(As an interesting side note, Gato’s paper uses a scheme to describe the risk devised by Margaret Michell and colleagues, former Google AI researchers, called Model Cards. the model provides a brief summary of what an AI program is, what it does, and what factors influence how it works. she was forced to leave Google for supporting her former colleague, Timnit Gebru, whose ethical concerns over AI have dented Google’s AI leadership.)

Gato is not unique in its tendency to generalize. It is part of a trend towards generalization and larger models using the horsepower pool. The world got a taste of Google’s tilt in this direction for the first time last summerwith Google’s “Transceivers” neural network that combined Text Transformer tasks with LiDAR images, sounds, and spatial coordinates.

Also: Google’s supermodel: DeepMind Perceiver is a step on the road to becoming an AI machine that can handle anything and everything

Among its peers are PaLM, Path Language Modeling, introduced this year by Google scientistsa 540 billion parametric model that uses new technology to coordinate thousands of chips, called Pathways, also invented at Google. A neural network Released in January of Meta, called “data2vec”, uses Transformer for image data, speech-audio waveform, and text-language representation all in one.

It seems that what’s new about Gato is its intention to take AI to be used for non-robot tasks and push it into the robotics realm.

Gato’s creators, noting the achievements of Pathways and other generalist approaches, see the ultimate achievement of AI that can work in the real world, for any type of task.

“Future work should consider how to consolidate these text capabilities into a fully generalized agent that can also operate in real-time in the real world, in diverse environments and scenarios. ”

Therefore, you can consider Gato as an important step on the road to problem solving The most difficult problem of AI, the robot.

Source link


News7g: Update the world's latest breaking news online of the day, breaking news, politics, society today, international mainstream news .Updated news 24/7: Entertainment, the World everyday world. Hot news, images, video clips that are updated quickly and reliably

Related Articles

Back to top button