Voice mimicry technology causes concern for privacy, security and accuracy of news

Will the days of reliable, trustworthy news sources soon be over? At the rate that artificial intelligence (AI) is currently advancing, that seems highly likely. Companies like China’s Baidu are now using AI to create advanced voice mimicry technology that’s slowly but surely becoming more than a bit scary.

As of this time’s writing, they are now in possession of AI tech that can mimic your voice through a sample that’s not even one minute long. And it’s this kind of technology that could potentially put the future of all news reports into question. Using such a tool, it could be possible to make anyone sound like anyone, or anyone say anything at any time you want.

According to Leo Zou, a member of the Baidu communications team, their new AI voice tech is streets ahead of anything that has ever existed before. “From a technical perspective, this is an important breakthrough showing that a complicated generative modeling problem, namely speech synthesis, can be adapted to new cases by efficiently learning only from a few examples,” he said. “Previously, it would take numerous examples for a model to learn. Now, it takes a fraction of what it used to.” (Related: New level of intelligence: A new AI can teach itself with a “reinforcement learning algorithm” resulting in “superhuman” abilities within hours.)

To come up with this sophisticated voice mimicry technology, Baidu based its work around its own text-to-speech synthesis system called Deep Voice, which is said to have been trained with upwards of 800 hours of audio sources from a total of 2,400 speakers. The official numbers state that it needs about 100 5-second sections of vocal training data to sound great, but even just 10 five-second samples were enough for it to trick a voice recognition system with almost perfect accuracy.

Zou remarked that there may be many great use cases or applications of this technology. “For example, voice cloning could help patients who lost their voices. This is also an important breakthrough in the direction of personalized human-machine interfaces,” he explained, adding that a mom could use it to easily configure an audiobook reader by speaking to it. “The method [additionally] allows creation of original digital content. Hundreds of characters in a video game would be able to have unique voices because of this technology. Another interesting application is speech-to-speech language translation, as the synthesizer can learn to mimic the speaker identity in another language.”

While it may offer a number of positives, there’s no denying that it can be used and abused for nefarious purposes as well. It has the potential to blow the door to scandalous fake news stories wide open. And if paired with deep learning technology that allows faces to be superimposed onto other people’s bodies, it could be used to fabricate entirely fake news stories with fakes news anchors, fake speakers, and fake interview guests, talking about fake news subjects. It spells out a very scary future that may be hard to avoid.

In any case, the technologies being developed now will likely become even more advanced in the future, so it would be best to know the risks involved in their use before they begin to cause any real world harm.

Read more about the future of AI in Robots.news.

Sources include:



comments powered by Disqus