The Chinese answer to Google can now clone your voice using AI after hearing you talk for just one minute. Baidu, who created this creepy technology, says it could also be used to create personalised digital assistants and automatic speech translation services (stock image)

Baidu’s creepy new AI can accurately mimic your voice

Argentina China World
Share this:
Share

The Chinese answer to Google can now clone your voice using AI after hearing you talk for just one minute.

Baidu, who created this creepy technology, says it could also be used to create personalised digital assistants and automatic speech translation services.

Deep Voice learns which sounds go with a text as well as the quirks of how someone communicates, creating you a robot-self indistinguishable from how you really talk.

Although this voice-copying technology might be amusing it also has serious implications, as users can essentially poach part of someone else’s identity.  

Scroll down for video 

The Chinese answer to Google can now clone your voice using AI after hearing you talk for just one minute. Baidu, who created this creepy technology, says it could also be used to create personalised digital assistants and automatic speech translation services (stock image)

The Chinese answer to Google can now clone your voice using AI after hearing you talk for just one minute. Baidu, who created this creepy technology, says it could also be used to create personalised digital assistants and automatic speech translation services (stock image)

The Chinese answer to Google can now clone your voice using AI after hearing you talk for just one minute. Baidu, who created this creepy technology, says it could also be used to create personalised digital assistants and automatic speech translation services (stock image)

Researchers from the Beijing-based technology firm trained its text-to-speech synthesis system on more than 800 hours of audio, taken from around 2,400 different speakers.

To work at its best, Deep Voice requires 100 five-second sections of sound but it can trick a voice recognition system 95 per cent of the time with just ten five-second samples.

The technology could duplicate the voices of people who have lost the ability to use their voice, developers say. 

Children could also be read to by their parent’s voice, even if they’re far away. 

‘From a technical perspective, this is an important breakthrough showing that a complicated generative modelling problem, namely speech synthesis, can be adapted to new cases by efficiently learning only from a few examples,’ Leo Zou, a member of Baidu’s communications team, told Digital Trends.

‘Previously, it would take numerous examples for a model to learn. Now, it takes a fraction of what it used to.’

Deep Voice can also change the output of your voice to reflect a different gender or accent.

Your browser does not support the audio element.

Audio is processed by Deep Voice, and can then be used to generate new speech in the same voice. The text-to-speech synthesis system was trained on more than 800 hours of audio taken from around 2,400 speakers

Your browser does not support the audio element.

It can also change a human male voice into a female. Here’s the male version of the voice

Your browser does not support the audio element.

This is the software interpreting that male voice and making it female. It learns what sounds go with what texts and also learns the quirks of how each speaker communicates so your robot-self is indistinguishable from how you really talk

Deep Voice learns which sounds go with a text as well as the quirks of how someone communicates, creating you a robot-self indistinguishable from how you really talk (stock)

Deep Voice learns which sounds go with a text as well as the quirks of how someone communicates, creating you a robot-self indistinguishable from how you really talk (stock)

Deep Voice learns which sounds go with a text as well as the quirks of how someone communicates, creating you a robot-self indistinguishable from how you really talk (stock)

Researchers say there are many applications of this technology, including helping people who have lost their voices to communicate.

‘Hundreds of characters in a video game would be able to have unique voices because of this technology’, Mr Zou said.

‘Another interesting application is speech-to-speech language translation, as the synthesizer can learn to mimic the speaker identity in another language.’

Baidu researchers are not the first ones to make voice-replicating AI. 

Last year, a project called Lyrebird used neural networks to imitate people, including Donald Trump and Barack Obama.

HOW DOES VOICE IMITATION WORK?

Baidu’s AI 

Baidu’s AI learns what sounds go with what texts as well as the quirks of how someone communicates.

It can clone your voice using neural networks after hearing you talk for just one minute. 

The text-to-speech synthesis system Deep Voice was trained on more than 800 hours of audio taken from around 2,400 speakers. 

To work at its best, the Deep Voice technology requires 100 five-second sections of sound. 

However, it can trick a voice recognition system 95 per cent of the time with just ten five-second samples. 

The technology could duplicate the voices of people who have lost the ability to use their voice, developers say.

Children could also be read to by their parent’s voice, even if they’re far away. 

The Lyrebird

The Lyrebird service allows users to compress the individual characteristics of a voice into a single key which means users can generate 1000 sentences in less than half a second. 

Not only can users create voices but they can control the generated voice too – for example making it sound angry, sympathetic or stressed.

The creators say the application will have a wide range of applications, including personal assistants, reading for audio books with famous voices and speech synthesis for people with disabilities.

The team also believe it will be used for animation movies and for video game studios.

Developers acknowledge that this could have dangerous implications such as ‘misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else’.

By releasing the technology publicly the team believe audio recordings will no longer be used as evidence or for identification in the future.

Using artificial intelligence (AI) the Lyrebird service uses a voice-imitation algorithm to mimic a person’s voice and have it read any text with a given emotion. 

The Canadian AI startup relies on deep learning models developed by PhD students at the University of Montreal.

The technology is named after the Australian lyrebird which can mimic 20 species at the same time.

The service uses AI to compress the individual characteristics of a voice into a unique code.

Developers say this code can be fed through algorithms to generate 1,000 sentences in less than half a second.

Not only can algorithms synthesise voices but they can control the generated voice too – for example making it sound angry, sympathetic or stressed.

On the website, Lyrebird.ai samples using the voices of Donald Trump, Barack Obama and Hillary Clinton illustrate how accurate the technology is.

The creators say the AI will have a wide range of applications, including personal assistants, reading for audio books with famous voices and speech synthesis for people with disabilities.

The name of the technology is inspired by the Australian lyrebird (pictured, stock image). It has a highly sophisticated larynx and can mimic 20 species at the same time

The name of the technology is inspired by the Australian lyrebird (pictured, stock image). It has a highly sophisticated larynx and can mimic 20 species at the same time

The name of the technology is inspired by the Australian lyrebird (pictured, stock image). It has a highly sophisticated larynx and can mimic 20 species at the same time

Developers acknowledge that this could have dangerous implications such as ‘misleading diplomats, fraud and more.

By releasing the technology publicly, the team believe audio recordings will no longer be used as evidence or for identification in the future.

The name of the technology is inspired by the lyrebird – one of Australia’s most well-known and loved birds.

It easily mimicking any sounds, from chainsaws to car-horns and all the birds of the rainforest. 

Mixed in with it’s own unique sounds of clicks and song, they can also be heard mimicking other birds and even mammals. 

The company says its speech technology is still ‘in beta’ with no mention of when it will be released or how much it would cost.

Share this:
Share

Leave a Reply

Your email address will not be published. Required fields are marked *

Human Verification: In order to verify that you are a human and not a spam bot, please enter the answer into the following box below based on the instructions contained in the graphic.