New ‘Voice Engine’ from OpenAI Needs Only 15 Seconds to Clone Speech

March 30, 2024

OpenAI, the AI company behind dominant generative AI tool ChatGPT, has unveiled a new voice cloning technology it calls “Voice Engine.” This audio model can replicate a person’s voice, intonation, and other distinctly human speech patterns based on a relatively small sample of original audio. “It is notable that a small model with a single 15-second sample can create emotive and realistic voices,” the company says in its Friday blog post. For comparison, AI voice platform ElevenLabs features an instant voice cloning tool that requires samples of at least one minute. For best results, nearly 10 minutes of continuous speech is needed for its professional service level. The company showed different examples of what this technology is capable of doing. In one example, the voice of a young patient who lost much of her ability to speak due to a vascular brain tumor was cloned using an older recording she made for a school project. This is how she sounds today, according to OpenAI. OpenAI worked with Lifespan, a nonprofit affiliated with the medical school at Brown University and the creators of a tool called Livox, an “alternative communication app” built for people with disabilities. The team was able to work with a recording that the woman made for a school presentation: The Open AI Voice Engine was then able to provide instant text-to-speech capability that would allow the patient to effectively speak with her own voice: OpenAI also showcased how HeyGen is using its technology to generate natural-sounding translations of speech uploaded in a specific language in another language. The company says Voice Engine was first developed in late 2022 and is already being used to power the preset voices available in OpenAI’s text-to-speech API, as well as ChatGPT’s Voice and Read Aloud feature. With the latest advancements, the company says it’s being cautious before a broader release. ”We hope to start a dialogue on the responsible deployment of synthetic voices and how society can adapt to these new capabilities,” OpenAI wrote, acknowledging the widely condemned practice of “deepfakes.” The voices of celebrities, government officials, and increasingly private citizens are being impersonated for nefarious purposes, from political campaigns, fake ads and outright criminal activities. U.S. President Joe Biden has been pushing for more safeguards against the malicious use of AI voice impersonations. In fact, Meta disclosed last summer that its AI voice tool was being held back specifically because of the “potential risks of misuse.” “In line with our approach to AI safety and our voluntary commitments, we are choosing to preview but not widely release this technology at this time,” OpenAI explained. Even before public release, OpenAI is placing restrictions on Voice Engine—including a list of prominent people that it will not emulate. “We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures,” OpenAI wrote. The partners testing Voice Engine today have agreed to OpenAI’s usage policies, which prohibit the impersonation of another individual or organization without consent. In addition, the company requires explicit and informed consent from the original speaker, and they don’t allow developers to build ways for individual users to clone their own voices. “Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale,” the blog post reads. In addition to Voice Engine, Open AI is working on multiple projects in parallel. CEO Sam Altman revealed that the company is working on releasing GPT-5 this year. The company also showed off its generative video tool Sora. The company claims that Sora will be the most advanced video generator on the market, surpassing models like Pika, Stable Video Diffusion, and Runway ML. Sora is currently only available to “red teamers” enlisted by Open AI to make sure it cannot be abused. Voice Engine could certainly outperform other voice cloning tools, including offerings from Meta, ElevenLabs, WellSaid Labs, and open-source models like RVC. Open AI is also working on a secret project named Q* of which only its name has been leaked. Sam Altman has refused to give any details, but said the research team was heavily focused on finding techniques and approaches that make AI reason better.

Previous articleDespite Warren Buffett’s Warnings Calling Bitcoin ‘Rat Poison,’ Berkshire Hathaway Continues Profiting Off Crypto

Next articleWhatsApp improved bottom navbar now officially rolling out

Disney stock jumps as earnings, streaming profit, and guidance top estimates

PayPal Begins Rollout of ‘Pool Money’ Feature for Shared Expenses

Santander Commercial Bank Delivers New Products and Digital Capabilities to Drive…

Japan GDP expands by 0.3% in third quarter, snapping two quarters…

Nuke From Orbit Sets Out to Help Banks Protect Customers’ Mobile…

Ukraine Shows U.S. How To Beat China In Drone Battery Wars

AI and 3D printing combine for advanced monitoring of small nuclear…

Novel electro-biodiesel offers a more efficient, cleaner alternative to existing options

People are fleeing Elon Musk’s X for Threads and Bluesky. Welcome…

Citigroup cuts copper forecast on tariffs risk, China outlook

Vaneck’s Matthew Sigel Sets Bitcoin Target at $180,000

Disney earnings offer hope that streaming can successfully supplant linear TV

Bitcoin Price And The Trump Effect: Here’s What Happened The Last…

Powell says the Fed doesn’t need to be ‘in a hurry’…

Global Oil Market Faces a Million-Barrel Glut Next Year, the IEA…

State leaders urged to divest pension funds from China: watchdog

The average amount Americans have saved for retirement in every U.S….

I Don’t Care If My Savings Account Has the Highest APY….

The House just voted ‘yes’ on a bill that would increase…

The House just voted ‘yes’ on a bill that would increase…

New ‘Voice Engine’ from OpenAI Needs Only 15 Seconds to Clone Speech

Must Read

Bitcoin Price And The Trump Effect: Here’s What Happened The Last...

Powell says the Fed doesn’t need to be ‘in a hurry’...

Ukraine Shows U.S. How To Beat China In Drone Battery Wars

AI and 3D printing combine for advanced monitoring of small nuclear...

Novel electro-biodiesel offers a more efficient, cleaner alternative to existing options

Most Viewed

Novel electro-biodiesel offers a more efficient, cleaner alternative to existing options

Europe markets rally to start the week on a positive note;...

PayPal Begins Rollout of ‘Pool Money’ Feature for Shared Expenses

Trending Now

The average amount Americans have saved for retirement in every U.S. state—see how you...

State leaders urged to divest pension funds from China: watchdog

The House just voted ‘yes’ on a bill that would increase Social Security checks...

New ‘Voice Engine’ from OpenAI Needs Only 15 Seconds to Clone Speech

RELATED ARTICLES

Must Read

Most Viewed

Trending Now