What's New November 24, 2021

The new era of AI synthetic voices

When the future became a little more real with the beginning of the space race, us mere mortals have sometimes grown apprehensive anticipating what the future will look like, as technology catches up to us.

Robots and speaking AI made appearances in prophetic pop culture with their mechanical jumpy voices like KITT in Knight Rider, as we tried to grapple with the inevitable reality of co-existence with artificial intelligence.

In reality, Kitt was an English voice over artist made to sound like he was speaking through a synthesiser, like in the Foo Fighters song, Generator.

We now live firmly in the age of technology, where we talk to our cars like we are all driving Knight Rider’s KITT

We talk to our TVs, we ask smart speakers in the kitchen what the weather will be, and they will even tell a joke if we ask.

Alexa and Siri are our artificial servants. They speak to us with synthesised voices, but, like a child finally moving out of home, they are about to grow up.

You may have read recently that Adthos is taking things to the next level with AI synthetic voice technology and text to speech software.

This month Adthos launched the biggest audio ad campaign in history for Covid-19 vaccinations, covering 6,500 cities in 40 countries, and in 70 languages and dialects – all using AI generated voice overs.

What does this mean for voice over artists and what are the ethics involved in such a transition?

Raoul Wedel, founder of Wedelsoft Software – the organisation behind the Adthos brand, recently spoke with radioinfo to answer questions around the next step in the evolution of advertising VOs.

Peter asked Raoul if they had received any negative feedback surrounding the ethics involved.

“There has, and to be honest, it greatly varies by the country, the ad itself, and also by the voiceover artist, of course, because they’re still human beings who have different opinions and different views,” said Raoul.

“One voiceover that we recorded is actually a major iHeart talent in the US. And he did many shows on KISS-FM Boston while located in Los Angeles, and he thought it was the coolest thing ever. He said, ‘You know, this is going to happen anyway. There’s no way that I’m going to change that. So, I might as well just take it and run with it.”

There’s a myriad of applications that can sample a human voice and generate audio from a script, but the Adthos system is a little different.

“Our system comes with a set of default voices. If you’re a brand, let’s say McDonald’s and you have 400 locations in your country and you want to send every local station a different version of an ad with the local address of the local franchise, then we can take their regular corporate voice that does all your ads on TV and radio. We can put that voice into the system and create 600 versions. And the franchisees could potentially change that too… that’s the highest end on the spectrum of what we’re doing,” said Raoul.

The voice over artist still gets paid when their voice is used to generate AI audio, despite not physically being there to read each individual version.

“It’s different from market to market. There are great differences because the smaller the market is, the more afraid the talent is of competing with an AI version of themselves in their own market… potentially they’re going to hear their voice on every station and then nobody’s going to book them in real life anymore, especially if the talent pool is small,” said Raoul.

“We have two fee models for voice artists. We have a buyout model, where we give them a one-time fee and we can use their voice for it in smaller markets. We also have an option where the talent is paid royalties.”

Raoul compares the development of synthesizers to the development of AI voice audio in a way that suggests we are at the brink of major technological transitions that we have anticipated for decades.

“You can compare this with the first music synthesisers. It’s kind of the same thing. When those first synthesisers came along and you heard a string orchestra, it didn’t sound very much like a string orchestra. But these days, most professional musicians wouldn’t be able to tell the difference between a synthesised violin and a real violin because the quality has improved so fast,” said Raoul.

“This is growing extremely fast. The technology provider we’re working with has updates coming all the time on a day-by-day basis.”

Alongside the development of technology like Elon Musk’s Tesla Bot, in the near future we could be swapping out our Siris and Google Home speakers for autonomous communicating AI robots.

“I personally think that in five years time, there’s not going to be a Starbucks or a McDonald’s that has a real person taking an order on the speaker of the Drive-Thru. The technology is there and it’s going to be very quick.”

Like a train on tracks, we are speeding towards a new destination.