Amazon has made the world's largest text-to-speech AI model with the "rudiments of intelligence"

0
26KB

Researchers at Amazon have trained the largest text-to-speech model in history, which they say demonstrates "an incipient ability to pronounce even complex sentences in a way that sounds quite natural."

The technology is constantly improving, but the researchers hope that the level of development of language models will increase as they reach a certain size, and over time they will become much more reliable and versatile, capable of performing tasks that they have not been trained to perform.

That doesn't mean they gain intelligence, it's just that after a certain point, their performance in some tasks changes. The Amazon AGI team thought the same could happen as text-to-speech models grew, and their research shows that this is indeed the case.

The new model is called Big Adaptive Streamable TTS with Emergent abilities, or BASE TTS. The largest version of the model uses 100,000 hours of audio from open sources, 90% of which are in English and the rest in German, Dutch and Spanish.

With 980 million parameters, BASE-large is the largest model in this category. The researchers also trained models with 400 million and 150 million parameters based on 10,000 and 1,000 hours of audio recordings, respectively. The idea is that if one of these patterns exhibits new skills and the other does not, then there is a range in which these behaviors begin to emerge.

As it turned out, the medium-sized model showed the desired leap in capabilities not only in the quality of speech (it was better, but only by a couple of points), but also in the set of emerging abilities that they observed and measured.

"The model is capable of performing a number of complex tasks, such as parsing complex sentences, placing phrasal stress on compound nouns, creating emotional speech, or whispered speech, or correctly pronouncing foreign words, or naming signs such as @. Moreover, BASE TTS is not trained to perform any of them," the authors write.


Such speech features usually baffle text-to-speech systems that mispronounce, skip words, use unnatural intonation, or make other mistakes. The BASE TTS still had problems, but it dealt with them much better than its contemporaries, such as the Tortoise and VALL-E.

Notably, this model is "streaming," as the name suggests. It doesn't need to generate entire sentences at once. This happens gradually and at a relatively low bitrate. There are plenty of examples on the language model's website of how it pronounces even complex texts quite naturally. Of course, they were carefully selected by the researchers, but it's still impressive.

Because the three BASE TTS models share a common architecture, it's clear that the size of the model itself and the amount of training data determine its ability to handle complex tasks. It should be remembered that this is still an experimental, not a commercial development. Further research will have to determine the point of emergence of new abilities, as well as how to effectively train the resulting model.

It looks like text-to-speech models could be the next technological breakthrough in 2024. However, there is no denying the benefits of this technology, in particular for providing access to information for users with disabilities.

Earlier, Mark Zuckerberg was accused of an irresponsible approach to artificial intelligence, as he promised to create a powerful AI system that is not inferior to a human in terms of intelligence. The head of Meta said that the company will try to create an open-source general-purpose artificial intelligence (AGI) system, that is, it will be available to developers outside the company.

Rechercher
Catégories
Lire la suite
Телевидение
Прямой эфир АСТВ. ТВ онлайн. Сахалин.
Официальный канал медиахолдинга «АСТВ». Круглосуточное теле-вещание. Смотрите...
Par Nikolai Pokryshkin 2022-10-17 11:54:27 0 22KB
Internet Broadcasts
The Rise of Internet Broadcast News: Redefining Media Consumption
The Rise of Internet Broadcast News: Redefining Media Consumption In an era characterized by...
Par Leonard Pokrovski 2024-05-03 19:30:15 0 13KB
История
Спасти рядового Райана. Saving Private Ryan. (1998)
Вторая мировая. Капитан Джон Миллер получает тяжелое задание. Вместе с отрядом из восьми человек...
Par Nikolai Pokryshkin 2022-11-18 17:37:54 0 20KB
Life Issues
A Teacher. (2013)
A high school teacher in Austin, Texas takes sexual advantage over one of her students. Her life...
Par Leonard Pokrovski 2023-05-06 17:34:10 0 33KB
Научная фантастика и фэнтези
Звёздные войны: Эпизод 5 — Империя наносит ответный удар. Star Wars: Episode V - The Empire Strikes Back. (1980)
Борьба за Галактику обостряется в пятом эпизоде космической саги. Войска Императора начинают...
Par Nikolai Pokryshkin 2022-11-12 13:20:55 0 28KB
image/svg+xml


BigMoney.VIP Powered by Hosting Pokrov