PaLM: spectacular results and effective training
The new model from Google surpassed other large language models by 540 billion parameters in a few-shot setup on 28 out of 29 benchmarks tested.
Such success was achieved thanks to a new approach to learning called Pathways. It is based on the use of asynchronous data flow. In addition to high quality, this approach ensures efficient utilization of computing resources — almost 60%.
Among the tasks that the model has succeeded in solving are not only traditional QA or translation, but even guessing a movie by emoji and explaining jokes.
Large language models are a real engine of progress in AI in recent years. The “steroid race” of transformers has given a lot to both science and business, but it cannot go on indefinitely. Google’s approach is aimed precisely at improving the effectiveness of training, and not at simply “throwing iron”.
NVIDIA introduced new GPU
During the GTC 2022 conference, the company announced the new Hopper architecture and the first products based on it.
The H100 computing accelerator shows a threefold increase in performance in FP32 operations compared to the previous generation A100.
The new product uses the Transformer Engine, designed to improve the efficiency of training large models, and 80GB of HBM3 memory with a bandwidth of up to 3 TB/s.
A new generation of DGX systems and DGX SuperPOD clusters will be built on the basis of H100 accelerators.
It is important that the GPU architecture is created with an eye to the architecture of the models that will be trained on them. Transformer Engine and other innovations will allow to some extent to smooth out the impact of the fact that modern models are growing ten times faster than the power of a single accelerator.
Evolution of image generation with DALL—E 2
A year after the announcement of DALL-E, researchers from OpenAI presented the second version of their model, designed to generate images by text query.
DALL-E 2 is not only capable of creating images with a resolution 4 times larger, but can also edit them by adding or removing elements according to the request.
As a decoder for the new version, it was decided to use a diffusion model instead of an autoregressive one, based on considerations of generation quality and computational efficiency.
Another step forward in the development of multimodal models, and the generation results are really impressive. In addition to moving towards AGI, such developments may have quite practical applications even in our business, for example, the generation of advertising creatives.
MTS AI and Skoltech has developed a language detoxifier
The AI editor created by experts is able to detect profanity in the text and replace it with more acceptable expressions.
The developers note that this solution is unique for the Russian market, since the existing services are suitable primarily for the English language.
The solution is based on two transformer models: BERT and T5. The first is intended for local editing, and the second, in fact, rewrites the entire text in a more neutral way.
These models are needed in order to make online communication comfortable and safe. And it’s not just about chatbot messages that can say something wrong, having learned from texts from forums. We have the opportunity to offer users less toxic formulations, while preserving the essence of the message.
MTS Platform Big Data it has become a “digital breakthrough” for power grids
MTS’s solution to search for commercial losses in the power grid has become the leader of the Digital Breakthrough competition within the framework of the international forum “Electric Networks”.
The model works on the basis of data from automated electricity metering systems. Learning from previously discovered cases of theft, she is able to find the characteristic signs of such events.
Identified suspicious consumers are displayed on the map in an interface convenient for planning team visits.
It is important for us that it turns out to find an application for expertise even outside the core business. Once again, the MTS Big Data team managed to show that machine learning is not somewhere out there, but here, in applied and very important tasks, it helps, for example, to reduce electricity losses and make it cheaper.