OpenAI’s new AI image generator pushes the limits in detail and prompt fidelity

News Team

1 year ago

On Wednesday, OpenAI announced DALL-E 3, the latest version of its AI image synthesis model that features full integration with ChatGPT. DALL-E 3 renders images by closely following complex descriptions and handling in-image text generation (such as labels and signs), which challenged earlier models. Currently in research preview, it will be available to ChatGPT Plus and Enterprise customers in early October.

Like its predecessor, DALLE-3 is a text-to-image generator that creates novel images based on written descriptions called prompts. Although OpenAI released no technical details about DALL-E 3, the AI model at the heart of previous versions of DALL-E was trained on millions of images created by human artists and photographers, some of them licensed from stock websites like Shutterstock. It’s likely DALL-E 3 follows this same formula, but with new training techniques and more computational training time.

Judging by the samples provided by OpenAI on its promotional blog, DALL-E 3 appears to be a radically more capable image synthesis model than anything else available in terms of following prompts. While OpenAI’s examples have been cherry-picked for their effectiveness, they appear to follow the prompt instructions faithfully and convincingly render objects with minimal deformations. Compared to DALL-E 2, OpenAI says that DALL-E 3 refines small details like hands more effectively, creating engaging images by default with “no hacks or prompt engineering required.”

In comparison, Midjourney, a competing AI image synthesis model from another vendor, renders photorealistic details well, but it still requires a great deal of counter-intuitive tinkering with prompts to gain any control over the image output.

DALL-E 3 also appears to handle text within images in a way that its predecessor couldn’t (some competing models like Stable Diffusion XL and DeepFloyd are getting better at it). For example, a prompt that included the words, “An illustration of an avocado sitting in a therapist’s chair, saying ‘I feel so empty inside’ with a pit-sized hole in its center,” created a cartoon avocado with the character quote perfectly encapsulated in a speech bubble.

Notably, OpenAI says that DALL-E 3 has been “built natively” on ChatGPT and will arrive as an integrated feature of ChatGPT Plus, allowing conversational refinements to images in a way that will use the AI assistant as a brainstorming partner. It also means that ChatGPT will be able to generate images based on the context of the current conversation, which may lead to novel new capabilities. Microsoft’s Bing Chat AI assistant, also built on technology from OpenAI, has been able to generate images in conversation since March.

The teapot that created a tempest

The original version of DALL-E emerged in January 2021, and OpenAI debuted its dramatically more capable sequel in April 2022, launching a new era of AI-generated imagery with a startling bang that captivated its initial closed-beta testers. The DALL-E models use a technique called latent diffusion that refines noise into images it “recognizes” from knowledge it gained from training on a data set and guidance from a prompt. The same tech allowed the emergence of the open-weight model Stable Diffusion in August last year.

Due to how DALL-E learned concepts about images in training by scraping a massive data set of human-produced artwork, AI image generation technology has been wildly controversial since its mainstream introduction last year. The technology has spawned protests from artists who fear it will replace them or unethically replicate their styles, lawsuits around copyright infringement based on scraped images used as training data without consultation of copyright holders, and new rulings about copyright from the US Copyright Office and a US district court judge.

As a nod to these controversies, OpenAI says that DALL-E 3 is designed to decline requests that ask for an image in the style of a living artist. OpenAI also provides a form where creators can opt out of having their images used to train future models. It seems unlikely that these measures will satisfy artists who typically think AI training should be opt-in only rather than included in image data sets by default.

Right now, US copyright policy says that purely AI-generated artwork cannot receive copyright protection, so technically any image created with DALL-E 3 will fall within the public domain. While OpenAI doesn’t acknowledge that explicitly, it does say that “the images you create with DALL-E 3 are yours to use and you don’t need our permission to reprint, sell or merchandise them.” That’s a marked change from last year when OpenAI restricted DALLE-2 image use based on a license that said OpenAI “owns all generations.”

Regarding safety, OpenAI says that, like DALL-E 2, it has implemented keyword and image detection filters in DALL-E 3 to limit its ability to produce violent, sexual, or hateful content. The system is also programmed to decline requests that generate images of public figures by name—which has caused issues with competing AI image generator Midjourney when it generated fake arrest images of Donald Trump.

OpenAI says it has worked with experts known as “red teamers” to identify and mitigate potential risks, such as harmful biases or the generation of propaganda and misinformation. OpenAI has given no word about its tool’s potential to bend the historical record with convincing fabrications, although it says it is experimenting with a “provenance classifier” tool that can help identify whether or not an image was generated by DALL-E 3.

As it stands, we do not have access to DALL-E 3 to test it yet, but OpenAI says the AI image generator is now undergoing closed testing. It plans to make it available to ChatGPT Plus and Enterprise customers “in October via the API and in Labs later this fall.”