Microsoft’s new safety system can catch hallucinations in its customers’ AI apps

March 29, 2024

Sarah Bird, Microsoft’s chief product officer of responsible AI, tells The Verge in an interview that her team has designed several new safety features that will be easy to use for Azure customers who aren’t hiring groups of red teamers to test the AI services they built. Microsoft says these LLM-powered tools can detect potential vulnerabilities, monitor for hallucinations “that are plausible yet unsupported,” and block malicious prompts in real time for Azure AI customers working with any model hosted on the platform.

“We know that customers don’t all have deep expertise in prompt injection attacks or hateful content, so the evaluation system generates the prompts needed to simulate these types of attacks. Customers can then get a score and see the outcomes,” she says.

That can help avoid generative AI controversies caused by undesirable or unintended responses, like the recent ones with explicit fakes of celebrities (Microsoft’s Designer image generator), historically inaccurate images (Google Gemini), or Mario piloting a plane toward the Twin Towers (Bing).

Three features: Prompt Shields, which blocks prompt injections or malicious prompts from external documents that instruct models to go against their training; Groundedness Detection, which finds and blocks hallucinations; and safety evaluations, which assess model vulnerabilities, are now available in preview on Azure AI. Two other features for directing models toward safe outputs and tracking prompts to flag potentially problematic users will be coming soon.

This is an example screenshot of content filter settings in the Azure AI Studio. These settings protect against prompt attacks or inappropriate content and decide what to do if something is flagged.

Image: Microsoft

Whether the user is typing in a prompt or if the model is processing third-party data, the monitoring system will evaluate it to see if it triggers any banned words or has hidden prompts before deciding to send it to the model to answer. After, the system then looks at the response by the model and checks if the model hallucinated information not in the document or the prompt.

In the case of the Google Gemini images, filters made to reduce bias had unintended effects, which is an area where Microsoft says its Azure AI tools will allow for more customized control. Bird acknowledges that there is concern Microsoft and other companies could be deciding what is or isn’t appropriate for AI models, so her team added a way for Azure customers to toggle the filtering of hate speech or violence that the model sees and blocks.

In the future, Azure users can also get a report of users who attempt to trigger unsafe outputs. Bird says this allows system administrators to figure out which users are its own team of red teamers and which could be people with more malicious intent.

Bird says the safety features are immediately “attached” to GPT-4 and other popular models like Llama 2. However, because Azure’s model garden contains many AI models, users of smaller, less used open-source systems may have to manually point the safety features to the models.

Microsoft has been turning to AI to beef up the safety and security of its software, especially as more customers become interested in using Azure to access AI models. The company has also worked to expand the number of powerful AI models it provides, most recently inking an exclusive deal with French AI company Mistral to offer the Mistral Large model on Azure.

Previous articleTink Partners With Payop to Roll out Pay by Bank in Europe

Next articleFinTech IPO Index Up Slightly as nCino’s Gains Offset Huize’s Slide

Exploring Fintech Zilch Following AWS Partner Expansion

Key Fed inflation measure rose 2.8% in March from a year…

Digital payments are becoming increasingly popular on King’s Day

Exxon stock falls as earnings miss on lower natural gas prices…

Real-Time Money Movement: Dispelling the Myths and Embracing the Opportunities

Major Google Pixel 8a leaks highlight features and software update promise

A framework to compare lithium battery testing data and results during…

Energy Efficiency is Critical for a Sustainable Future

Putting Microsoft’s cratering Xbox console sales in context

WhatsApp now rolling out passkey support for iPhone users

Bitcoin Chops Around $64K, With Japanese Yen’s Tumble Maybe Signaling ‘Currency…

Fed’s favorite inflation gauge and Big Tech earnings greet a slumping…

Bitcoin price coils near $64k, analyst predicts cycle top at $300k

Alphabet soars most since 2015 on strong earnings, first dividend and…

Weekly Market Review – April 27, 2024

Why Americans worry changes to the U.S. retirement system could upend…

2025 Social Security COLA: What is the projected increase for the…

Are You Saving Enough To Be In The Top 3% Of…

The new class war: A wealth gap between millennials

The unfortunate truth about maxing out your 401(k)

Microsoft’s new safety system can catch hallucinations in its customers’ AI apps

Must Read

Key Fed inflation measure rose 2.8% in March from a year...

Exxon stock falls as earnings miss on lower natural gas prices...

Why Americans worry changes to the U.S. retirement system could upend...

2025 Social Security COLA: What is the projected increase for the...

Are You Saving Enough To Be In The Top 3% Of...

Most Viewed

Mortgage rates today, April 21, 2024: Interest costs on the rise

European Stock Rally Falters Amid Mixed Earnings: Markets Wrap

An ultralow-concentration electrolyte for lithium-ion batteries

Trending Now

Major Google Pixel 8a leaks highlight features and software update promise

A framework to compare lithium battery testing data and results during operation

Energy Efficiency is Critical for a Sustainable Future

Microsoft’s new safety system can catch hallucinations in its customers’ AI apps

RELATED ARTICLES

Must Read

Most Viewed

Trending Now