NVIDIA’s latest tech makes AI voices more expressive and realistic

images/nvid.jpg

▲圖片標題(來源： Nvidia)

The voices on Amazon’s Alexa, Google Assistant and other AI assistants are far ahead of old-school GPS devices, but they still lack the rhythms, intonation and other qualities that make speech sound, well, human. NVIDIA has unveiled new research and tools that can capture those natural speech qualities by letting you train the AI system with your own voice, the company announced at the Interspeech 2021 conference.

To improve its AI voice synthesis, NVIDIA’s text-to-speech research team developed a model called RAD-TTS, a winning entry at an NAB broadcast convention competition to develop the most realistic avatar. The system allows an individual to train a text-to-speech model with their own voice, including the pacing, tonality, timbre and more.

Another RAD-TTS feature is voice conversion, which lets a user deliver one speaker’s words using another person’s voice. That interface gives fine, frame-level control over a synthesized voice’s pitch, duration and energy.

Using this technology, NVIDIA’s researchers created more conversational-sounding voice narration for its own I Am AI video series using synthesized rather than human voices. The aim was to get the narration to match the tone and style of the videos, something that hasn’t been done well in many AI narrated videos to date. The results are still a bit robotic, but better than any AI narration I’ve ever heard.

“With this interface, our video producer could record himself reading the video script, and then use the AI model to convert his speech into the female narrator’s voice. Using this baseline narration, the producer could then direct the AI like a voice actor — tweaking the synthesized speech to emphasize specific words, and modifying the pacing of the narration to better express the video’s tone,” NVIDIA wrote.

NVIDIA is distributing some of this research — optimized to run efficiently on NVIDIA GPUs, of course — to anyone who wants to try it via open source through the NVIDIA NeMo Python toolkit for GPU-accelerated conversational AI, available on the company’s NGC hub of containers and other software.

“Several of the models are trained with tens of thousands of hours of audio data on NVIDIA DGX systems. Developers can fine tune any model for their use cases, speeding up training using mixed-precision computing on NVIDIA Tensor Core GPUs,” the company wrote.

轉貼自： TechCrunch

若喜歡本文，請關注我們的臉書 Please Like our Facebook Page：　　　Big Data In Finance

NVIDIA’s latest tech makes AI voices more expressive and realistic

留下你的回應

以訪客張貼回應

回應

釘選列表

喜愛列表

Web Services

YOU MAY BE INTERESTED

Popular Tags

	今日	1506
	昨日	1344
	本週	5817
	本月	36311
	總訪客量	2119762