Tech

Train Smarter Bots for the Real World


With the IQ-Learn method, robot simply observing people to learn how to behave.

Image credit: Pxhere, Public Domain CC0

In the fall of 2021, dozens of computer scientists submitted their best work to an AI bot challenge organized by the Conference on Neural Information Processing Systems (NeurIPS), a learning event. annual machine for famous researchers. Participants spent months preparing their agents to outdo the competition on four “near-real” tasks in the Minecraft virtual world:

  1. Find a cave
  2. Create waterfall
  3. Create a village animal pen
  4. Build a village house

To mimic the complexity of real-world situations, the organizers required each agent in the competition to learn missions by watching human demonstrations without using the rewards that usually reinforce try the desired behavior of the robot. This is a significant change from previous competition rules, and it means most teams will have to deal with a slower and more complex bot training process.

Because Divyansh Garg and Edmund Mills, who entered the competition as Team Obsidian just weeks ahead of schedule, asked to show an opportunity to shine. With less time and fewer resources than other teams, they rose to the top of the leaderboard and topped the Imitation Learning category (designated for employees who interact with their environment to find understand rewards or policies). To their surprise, Team Obsidian also placed second overall – a remarkable achievement as their reps didn’t use human feedback to boost performance while playing the game, while many competitors Their defenders did it.

The key to the Obsidian Team’s remarkable success is a groundbreaking approach to Imitation Learning known as IQ-Learn. In the months leading up to the competition, officially known as the MineRL Benchmark for Near-Lifelike Task Solving Agents (BASALT), Garg collaborated on the development of this new method with Stefano Ermonan associate professor in the Department of Computer Science at Stanford. IQ-Learn could have played the classic Atari games better than a human expert. And it has quickly become the cutting-edge technology for training AI workers in dynamic environments.

Deep learning passion

Today’s industrial robots are adept at learning to repeat a task precisely through a process known as behavioral cloning. But when something changes in the environment that the machine has not encountered before, it cannot adjust quickly. The mistakes combine and the machine never recovers. If we expect to one day have AI agents that can drive cars, wash dishes, or do laundry better or better than humans, we need different ways of teaching them.

As a computer science student at Stanford with experience in robotics learning and genetic modeling, Garg realized that the next frontier for intelligent machines would involve building versatile agents that could can learn to perform complex tasks in an ever-changing environment.

“What a human can learn in an hour, it would take a robot 10 years,” he said. “I wanted to design an algorithm that could learn and transfer behavior as efficiently as a human.”

Imitate an expert

During his internship with machine learning researcher Ian Goodfellow at Apple, Garg understood some of the key concepts that inform how scientists train smarter employees:

  • Reinforcement learning Methods (RLs) allow the agent to interact with the environment, but the researchers must input a reward signal for the robot to learn the desired policy or action.
  • A subfield of RL is called Q Study allows a representative to start with a known reward and then learn what the Deep Learning community calls an energy-based model or a Q function. Borrowed from the field of statistical physics, a Q function It is possible to find relationships in a small data set and then generalize to a larger data set that follows similar patterns. In this way, the Q function can represent the expected policy for the robot to follow.
  • A related approach is called Learn to imitate keeps its promise because it empowers an agent to learn policy from seeing the visual demonstrations of a (human) expert on duty.
  • Reverse reinforcement learning has been considered cutting-edge for the past five years, because, in theory, Imitation Learning has gone a step further. In this case, instead of trying to understand a policy, the agent’s goal is to find the reward that explains the human example. The imperative here is that Inverse RL requires an adversarial reinforcement process – which means that the model has to mathematically solve for two unknown variables: reward and policy. According to Garg, this process is difficult to stabilize and does not work well with more complex situations.

With these concepts as context, Garg began thinking about how to achieve better results with a simpler approach to Mimicry. A nagging question began to keep him up at night: “What if you could solve just one unknown variable instead of two?” He reasoned that if the two variables of reward and policy could be represented by a hidden function Q, and if the agent learned this Q function from viewing human demonstrations, it could be avoided. the need to train problematic opponents.

Garg spent his winter break figuring out an algorithm and coding it. He was amazed when it worked the first time. After a month of development, the algorithm has beaten every other existing method on simple tasks and has proven to be exceptionally stable.

He recalls, “Professor Ermon looked at the results and said, ‘This is great, but why does it work?’ We don’t know of any theory that could explain it, so I challenged myself to write a mathematical framework that could prove the algorithm was optimal. ”

Expert-grade performance

Fast forward to summer 2021, and this new inverse soft-Q learning method (IQ-Learn for short) has achieved three to seven times better performance than previous human learning methods . Garg and his associates first tested the agent’s abilities with several console-based video games – Acrobot, CartPole, and LunarLander. In each game, the agent reached expert-level performance faster than any other method.

They next tested this model on several classic Atari games – Pong, Breakout and Space Invaders – and discovered their innovation also scales well in more complex gaming environments. “We exceeded previous best-of-breed products by five times while requiring three times less environmental steps, getting close to expert performance,” recalls Garg. (An environment step refers to some variation in state that the agent introduced to the bot to achieve this level of performance.)

The resulting scientific paper received a Spotlight designation to participate in the 2021 NeurIPS Conference. With this level of confidence and motivation, Garg suggested trying IQ-Learn in the MineRL challenge.

Success without humans in the loop

Sure, some of the “almost lifelike” missions in Minecraft are difficult for Team Obsidian. At one point during the challenge, their AI bot accidentally built a skyscraper by roofing a fence. It also manages to cage a villager instead of an animal. But Garg was pleased with the results. Their AI bot has learned to build walls, erect poles, and mount torches successfully. The first-place team used a total of 82,000 human-labeled images to help recognize in-game scenes, and spent about five months coding domain expertise for each mission. By comparison, Garg and Mills earned their spot without adding any domain knowledge to the model and had just three weeks to prepare.

“IQ-Learn is working beyond our expectations,” says Garg. “It’s a new paradigm for scaling smart machines that can do everything from autonomous driving to aiding healthcare.”

Garg imagines that one day we will be able to teach robots how to grasp objects in any situation simply by showing them videos of people picking up objects, or maybe even by responding to commands. phone. If we want to train agents to perceive and act in a multidimensional world, we need to enable better performance models faster, with limited data and time. It seems that efficiency is what determines how useful a robot is in real life.

The source: Stanford University






Source link

news7g

News7g: Update the world's latest breaking news online of the day, breaking news, politics, society today, international mainstream news .Updated news 24/7: Entertainment, Sports...at the World everyday world. Hot news, images, video clips that are updated quickly and reliably

Related Articles

Leave a Reply

Your email address will not be published.

Back to top button