OpenAI’s new AI image generator pushes the limits in detail and prompt fidelity

September 21, 2023

On Wednesday, OpenAI announced DALL-E 3, the latest version of its AI image synthesis model that features full integration with ChatGPT. DALL-E 3 renders images by closely following complex descriptions and handling in-image text generation (such as labels and signs), which challenged earlier models. Currently in research preview, it will be available to ChatGPT Plus and Enterprise customers in early October.

Like its predecessor, DALLE-3 is a text-to-image generator that creates novel images based on written descriptions called prompts. Although OpenAI released no technical details about DALL-E 3, the AI model at the heart of previous versions of DALL-E was trained on millions of images created by human artists and photographers, some of them licensed from stock websites like Shutterstock. It’s likely DALL-E 3 follows this same formula, but with new training techniques and more computational training time.

Judging by the samples provided by OpenAI on its promotional blog, DALL-E 3 appears to be a radically more capable image synthesis model than anything else available in terms of following prompts. While OpenAI’s examples have been cherry-picked for their effectiveness, they appear to follow the prompt instructions faithfully and convincingly render objects with minimal deformations. Compared to DALL-E 2, OpenAI says that DALL-E 3 refines small details like hands more effectively, creating engaging images by default with “no hacks or prompt engineering required.”

In comparison, Midjourney, a competing AI image synthesis model from another vendor, renders photorealistic details well, but it still requires a great deal of counter-intuitive tinkering with prompts to gain any control over the image output.

DALL-E 3 also appears to handle text within images in a way that its predecessor couldn’t (some competing models like Stable Diffusion XL and DeepFloyd are getting better at it). For example, a prompt that included the words, “An illustration of an avocado sitting in a therapist’s chair, saying ‘I feel so empty inside’ with a pit-sized hole in its center,” created a cartoon avocado with the character quote perfectly encapsulated in a speech bubble.

Notably, OpenAI says that DALL-E 3 has been “built natively” on ChatGPT and will arrive as an integrated feature of ChatGPT Plus, allowing conversational refinements to images in a way that will use the AI assistant as a brainstorming partner. It also means that ChatGPT will be able to generate images based on the context of the current conversation, which may lead to novel new capabilities. Microsoft’s Bing Chat AI assistant, also built on technology from OpenAI, has been able to generate images in conversation since March.

The teapot that created a tempest

An DALL-E 3 AI-generated image of "A 3D render of a coffee mug placed on a window sill during a stormy day. The storm outside the window is reflected in the coffee, with miniature lightning bolts and turbulent waves seen inside the mug. The room is dimly lit, adding to the dramatic atmosphere."

The original version of DALL-E emerged in January 2021, and OpenAI debuted its dramatically more capable sequel in April 2022, launching a new era of AI-generated imagery with a startling bang that captivated its initial closed-beta testers. The DALL-E models use a technique called latent diffusion that refines noise into images it “recognizes” from knowledge it gained from training on a data set and guidance from a prompt. The same tech allowed the emergence of the open-weight model Stable Diffusion in August last year.

Due to how DALL-E learned concepts about images in training by scraping a massive data set of human-produced artwork, AI image generation technology has been wildly controversial since its mainstream introduction last year. The technology has spawned protests from artists who fear it will replace them or unethically replicate their styles, lawsuits around copyright infringement based on scraped images used as training data without consultation of copyright holders, and new rulings about copyright from the US Copyright Office and a US district court judge.

As a nod to these controversies, OpenAI says that DALL-E 3 is designed to decline requests that ask for an image in the style of a living artist. OpenAI also provides a form where creators can opt out of having their images used to train future models. It seems unlikely that these measures will satisfy artists who typically think AI training should be opt-in only rather than included in image data sets by default.

Right now, US copyright policy says that purely AI-generated artwork cannot receive copyright protection, so technically any image created with DALL-E 3 will fall within the public domain. While OpenAI doesn’t acknowledge that explicitly, it does say that “the images you create with DALL-E 3 are yours to use and you don’t need our permission to reprint, sell or merchandise them.” That’s a marked change from last year when OpenAI restricted DALLE-2 image use based on a license that said OpenAI “owns all generations.”

Regarding safety, OpenAI says that, like DALL-E 2, it has implemented keyword and image detection filters in DALL-E 3 to limit its ability to produce violent, sexual, or hateful content. The system is also programmed to decline requests that generate images of public figures by name—which has caused issues with competing AI image generator Midjourney when it generated fake arrest images of Donald Trump.

OpenAI says it has worked with experts known as “red teamers” to identify and mitigate potential risks, such as harmful biases or the generation of propaganda and misinformation. OpenAI has given no word about its tool’s potential to bend the historical record with convincing fabrications, although it says it is experimenting with a “provenance classifier” tool that can help identify whether or not an image was generated by DALL-E 3.

As it stands, we do not have access to DALL-E 3 to test it yet, but OpenAI says the AI image generator is now undergoing closed testing. It plans to make it available to ChatGPT Plus and Enterprise customers “in October via the API and in Labs later this fall.”

Previous articleFedEx saw boost from rival UPS’s labor negotiations with Teamsters

Next articleInternal memo sheds light on the future of Windows and Surface without Panos Panay

The Rise in Chargeback Awareness and its Impact on Banks

‘Neobank in a box’ startup Fintech Farm raises $32 million

Biden to announce new 100% tariffs on Chinese EVs

GameStop stock soars over 70% as ‘Roaring Kitty’ revival reignites meme-stock…

What is ‘ghost debt’? Buy now, pay later plans more popular…

OpenAI launches new AI model and desktop version of ChatGPT

US energy panel approves rule to expand transmission of renewable power

China’s sweeping new e-bike battery rules could have a major impact…

Google teases new camera-powered AI feature one day ahead of I/O

Making batteries takes lots of lithium: Almost half of it could…

How OpenAI Trimmed $50 Billion off Alphabet’s Market Cap on Monday

Bitcoin bottomed at $56K? BTC price chart hints at breakout within…

The stock market is about to see a 10% correction, with…

Hong Kong Bitcoin and Ether ETFs See $39M Outflows on Monday:…

Dollar edges up ahead of Fed speak, Aussie dips

Is it time to rethink the 4% retirement withdrawal rule? Experts…

The Social Security Cost-of-Living Adjustment (COLA) Calculation Hurts Seniors. 1 Change…

‘The house is burning down’: Money expert’s advice for a couple…

Can You Guess How Many Retire With A $5 Million Nest…

Best Ways to Save for Retirement

OpenAI’s new AI image generator pushes the limits in detail and prompt fidelity

The teapot that created a tempest

Must Read

The stock market is about to see a 10% correction, with...

Hong Kong Bitcoin and Ether ETFs See $39M Outflows on Monday:...

OpenAI launches new AI model and desktop version of ChatGPT

US energy panel approves rule to expand transmission of renewable power

China’s sweeping new e-bike battery rules could have a major impact...

Most Viewed

Will Advanced Micro Devices Reach a Trillion-Dollar Market Cap by 2030?

Oura Ring just got two big health upgrades ahead of Samsung...

US banks to struggle with climate risk data – Federal Reserve

Trending Now

Is it time to rethink the 4% retirement withdrawal rule? Experts weigh in

The Social Security Cost-of-Living Adjustment (COLA) Calculation Hurts Seniors. 1 Change Could Bring Thousands...

‘The house is burning down’: Money expert’s advice for a couple with over $620,000...

OpenAI’s new AI image generator pushes the limits in detail and prompt fidelity

The teapot that created a tempest

RELATED ARTICLES

Must Read

Most Viewed

Trending Now