Nvidia has announced a new videoconferencing platform for developers named Nvidia Maxine that it claims can fix some of the most common problems in video calls.
Maxine will process calls in the cloud using Nvidia’s GPUs and boost call quality in a number of ways with the help of artificial intelligence. Using AI, Maxine can realign callers’ faces and gazes so that they’re always looking directly at their camera, reduce the bandwidth requirement for video “down to one-tenth of the requirements of the H.264 streaming video compression standard” by only transmitting “key facial points,” and upscale the resolution of videos. Other features available in Maxine include face re-lighting, real-time translation and transcription, and animated avatars.
Not all of these features are new of course. Video compression and real-time transcription are common enough, and Microsoft and Apple have introduced gaze-alignment in the Surface Pro X and FaceTime to ensure people keep eye contact during video calls (though Nvidia’s face-alignment features looks like a much more extreme version of this).
But Nvidia is no doubt hoping its clout in cloud computing and its impressive AI R&D work will help it rise above its competitors. The real test, though, will be to see if any established videoconferencing companies actually adopt Nvidia’s technology. Maxine is not a consumer platform but a toolkit for third-party firms to improve their own software. So far, though, Nvidia has only announced one partnership — with communications firm Avaya, which will be using select features of Maxine. As indicated in the image below, all major cloud vendors are offering Maxine as part of their Nvidia GPU cloud services.
In a conference call with reporters, Nvidia’s general manager for media and entertainment Richard Kerris, described Maxine as a “really exciting and very timely announcement,” and highlighted its AI-powered video compression as a particularly useful feature.
“We’ve all experienced times where bandwidth has been a limitation in our conferencing we’re doing on a daily basis these days,” said Kerris. “If we apply AI to this problem we can reconstruct the difference scenes on both ends and only transmit what needs to transmit, and thereby reducing that bandwidth significantly.”
Nvidia says its compression feature uses an AI method known as generative adversarial networks or GANs to partially reconstruct callers’ faces in the cloud. This is the same technique used in many deepfakes. “Instead of streaming the entire screen of pixels, the AI software analyzes the key facial points of each person on a call and then intelligently re-animates the face in the video on the other side,” said the company in a blog post. “This makes it possible to stream video with far less data flowing back and forth across the internet.”
As ever with these early announcements, we’ll need to see more of this tech in action and wait for any partnership deals Nvidia makes before we know how much of an effect this will have on everyday video calls. But Nvidia’s announcement shows how the future of videoconferencing will be more artificial than ever before, with AI used to straighten your gaze and even reconstruct your face, all in the name of saving bandwidth.