Tech

Meta’s Data2vec 2.0: Second Time Faster


Overlapping image of a large black dog and words "I drink black tea"

Meta’s Data2vec is an example of a generalized neural network that can use the exact same code to process data patterns in different ways — in this case, speech, text, and images — – and make predictions about that data.

Bayevski et al.

What do you do once you have proven your point in a neural network?

Doing it faster is an answer.

On Tuesday, Meta Properties, the owner of Facebook, Instagram and WhatsApp, unveiled Data2vec 2.0, a revamp of a Neural nets introduced earlier this year acts as a kind of generalization, performing tasks related to text, image, and speech data with the same basic approach for all three.

For the second time, Meta scientists made the program faster and, in some cases, more accurate in benchmarking tests of machine learning tasks.

“Data2vec 2.0 shows that the training speed of self-supervised learning can be significantly improved without reducing the accuracy of the downstream task,” said authors Alexei Baevski, Arun Babu, Wei-Ning Hsu and Michael Auli, four of the authors of the original write. Data2vec paper, in this new work, Effective self-supervised learning with contextual goal representations for vision, speech, and language, Posted on arXiv.

Also: What is ChatGPT and why is it important?

The only achievement of this second Data2vec is the reduction of Data2vec training time. Neural network training is usually measured in “epochs”, that is, the number of times the neural network is given training examples. It can also be measured by wall clock time, literally hours, minutes and days counted from start to finish.

“Tests show that Data2vec 2.0 can achieve the same accuracy as many popular algorithms currently available with 2-16 times the training speed,” they wrote.

The name Data2vec is a pun on a program for an “embedded” language Developed at Google in 2013 called Word2vec. That program predicted how words would group together, and so word2vec is a representation of a neural network designed for a specific type of data, in that case text.

However, in the case of Data2vec, Baevski and colleagues are using a neural network called Transformer, developed by Ashish Vaswani and colleagues. at Google in 2017and extend it to use for many data types. The same structure of the neural network can be used to train all three — images, speech, and text — without being altered to fit the characteristics of any of them, making it become a general program.

Baevski and colleagues extended Transformer to what is known as “self-supervised” learning. In a self-monitoring environment, neural networks are trained by going through several stages where their results are compared.

First, the network compresses a sample of data, known as constructing the input data representation. Then the second instance of the network has some “hidden” input data items, which are not revealed. It has to reconstruct the representation that the first version of the network built, which forces the second network to build a better model of how the data fits together by essentially filling in the blanks.

Also: The real goal of AI may no longer be intelligence

The two networks — the one with the compressed representation of the full, unmasked input and the one with the incomplete version it’s trying to perfect — are called Teacher and Student, respectively. The Student Network tries to develop the meaning of the data, if you will, by recreating what the Teacher has achieved despite the obfuscation.

This time, the authors made two main changes to Data2vec to make it faster: using “convolution” and “amortization” of the compressed representations of the teacher network.

At the first score, the student network that has to predict the teacher’s representations no longer uses a part of the Transformer called the decoder to do so.

It is the standard approach, to decompress, in a sense, the compressed representations of the teacher network. Instead, the authors use what is known as a convolutional neural network, a foundational tool in neural networks for representing data samples in compressed form and a much older tool than Transformer. It’s a good example of how old technology can survive in programming.

They write: “Instead of using a Transformer-based decoder, we use a smaller convolutional decoder, which we find easier to train and faster.

For the second change, instead of continuously generating the compressed representation in the teacher network, Data2vec only generates the representation once. It then reuses it as the target, the thing to guess, for each masked data point.

As the authors said, “To amortize the computational cost of the teacher model, we reuse the teacher representation for multiple hidden instances of the training sample.

“Specifically, we consider M different masked versions of the training sample and compute the loss against the same target representation.”

Data2vec 2.0 . Diagram

Architecture of Data2vec 2.0. Meta this time replaced the second part of the program, the Transformer based decoder, with a convolutional neural network based decoder, an older technology. They also reuse the compressed representations of the “teacher” network as the sole target for multiple hidden versions of the “student” network’s data.

Baevski and associates 2022

In the results section of the paper, Baevski and team relate how they cut training time and improved accuracy across all three areas of image recognition, speech recognition, and natural language processing.

For image processing, the authors used Data2vec as the basis to refine what is known as “ViT”, “vision Transformer”, a neural network specifically designed for vision tasks. was introduced last year (PDF) by Alexey Dosovitskiy and colleagues at Google. The Data2vec program is a pre-trained platform, on which ViT is a tweak, in terms of the documentation.

Compared to January’s results, Data2vec-powered ViT once again topped other neural networks used as the basis for ViT for accuracy on ImageNet, the classical test of image labeling and it also tops the previous version of Data2vec.

But aside from accuracy, the new Data2vec takes much less time to train. Data2vec earlier took 800 epochs; this time, that number has been reduced to 150 epochs. And besides competing self-monitoring networks, masked autoencoders or MAE, another Meta creation (PDF), training is cut down from 1,600 epochs to 100, even if Data2vec’s accuracy tops MAE. The faster training mode significantly reduces the absolute training time, only 66 hours for Data2vec 2.0 compared to 113.6 hours for MAE.

Also: Artificial Intelligence: 5 Innovative Apps That Can Change Everything

In speech recognition, the task is to fill in the missing parts of an excerpt in the audio file of a spoken phrase. The new Data2vec went up against many competing neural networks for voice, including the original data2vec and programs called Wav2vec, HuBERT, and WavLM. In no case did Data2vec 2.0 beat those networks, but it “achieves higher accuracy than other models with faster training times.” For example, 43 hours of training Data2vec 2.0 achieves 57 hours of precision required for the original Data2vec.

In the third area, natural language processing, Data2vec 2.0 has been tested through a series of challenges including the Common Language Understanding Assessment framework, known as GLUE, developed by the Courant Institute for Mathematical Sciences. by NYU 2019.

In one test, the network had to predict whether a sentence was a continuation from another – logical succession – while another representation task challenged the network to label a period grammatically correct.

Going against the original Data2vec, plus two Transformer-based programs, BERT by Google and a modified version, called RoBERTa, introduced in 2019 of the Paul Allen School of Computer Science at the University of Washington and Meta, version 2.0 of Data2vec scores high on GLUE results while training faster.

The average total accuracy score across all GLUE tasks for this new version is 82.6, which is only slightly lower than the original Data2vec’s 82.7, but higher than BERT’s 81.2 and higher than 82. ,5 by RoBERTa. However, Data2vec 2.0 took just 28.2 hours to get there, less than half of the original Data2vec’s 69 hours and much less than RoBERTa’s 50.5 hours.

Also: The people who build artificial intelligence are the ones who need AI the most

Baevski and team write that they will expand Data2vec in the future to other data forms beyond speech, images, and text, raising the prospect that it may become more general.

One limitation seems to remain intact. As with the original Data2vec, version 2.0 still treats each data type differently when they are first entered into the network during training. That means Data2vec has not developed a completely generic way to handle data types.

Images, words, and text are all prepared by pre-processing the data. That way, the multimodal aspect of the network still relies on clues about the data, what the team calls a “small method-specific input encoder”.

Furthermore, each compression encoding from the teacher network was created specifically for the three data types. It is not yet possible to create a “hypercoding” type that will combine all data types at once into one representation.

And so, as with Data2vec 1.0, a neural network can truly be the One Dominate network that remains the technology of the future.

As with the original Data2vec, Meta has posted the code on GitHub.

news7g

News7g: Update the world's latest breaking news online of the day, breaking news, politics, society today, international mainstream news .Updated news 24/7: Entertainment, Sports...at the World everyday world. Hot news, images, video clips that are updated quickly and reliably

Related Articles

Check Also
Close
Back to top button