It’s dangerously easy to ‘jailbreak’ AI models so they’ll tell you how to build Molotov cocktails, or worse

A jailbreaking method called Skeleton Key can prompt AI models to reveal harmful information.
The technique bypasses safety guardrails in models like Meta’s Llama3 and OpenAI GPT 3.5.
Microsoft advises adding extra guardrails and monitoring AI systems to counteract Skeleton Key.

It doesn’t take much for a large language model to give you the recipe for all kinds of dangerous things.

With a jailbreaking technique called “Skeleton Key,” users can persuade models like Meta’s Llama3, Google’s Gemini Pro, and OpenAI’s GPT 3.5 to give them the recipe for a rudimentary fire bomb, or worse, according to a blog post from Microsoft Azure’s chief technology officer, Mark Russinovich.

The technique works through a multi-step strategy that forces a model to ignore its guardrails, Russinovich wrote. Guardrails are safety mechanisms that help AI models discern malicious requests from benign ones.

“Like all jailbreaks,” Skeleton Key works by “narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do,” Russinovich wrote.

But it’s more destructive than other jailbreak techniques that can only solicit information from AI models “indirectly or with encodings.” Instead, Skeleton Key can force AI models to divulge information about topics ranging from explosives to bioweapons to self-harm through simple natural language prompts. These outputs often reveal the full extent of a model’s knowledge on any given topic.

Microsoft tested Skeleton Key on several models and found that it worked on Meta Llama3, Google Gemini Pro, OpenAI GPT 3.5 Turbo, OpenAI GPT 4o, Mistral Large, Anthropic Claude 3 Opus, and Cohere Commander R Plus. The only model that exhibited some resistance was OpenAI’s GPT-4.

Russinovich said Microsoft has made some software updates to mitigate Skeleton Key’s impact on its own large language models, including its Copilot AI Assistants.

But his general advice to companies building AI systems is to design them with additional guardrails. He also noted that they should monitor inputs and outputs to their systems and implement checks to detect abusive content.

It’s dangerously easy to ‘jailbreak’ AI models so they’ll tell you how to build Molotov cocktails, or worse

Govt increases petrol price by Rs7.45, takes it to Rs265.61 per litre

Punjab CM Maryam Nawaz Announces July 10 Distribution Date for E-Bikes

Related Posts

Blame Jeff Bezos! And then stop hoping billionaires bail out the news.

What Mark Zuckerberg, Jeff Bezos, Sam Altman, and other tech execs have said about parenting

New poll shows the shifting conversation around blue-collar work in the age of AI

All the ways Elon Musk’s companies are already intertwined, from a Tesla ‘collab’ with SpaceX to Grok in vehicles

The ‘father of the iPod’ says Apple should bring back a ‘nostalgic version’ of the music device

‘Unsubscribe’ and ‘opt out’: A new Big Tech boycott to protest ICE starts February 1

Popular Post

FRSHAR Mail set to redefine secure communication, data privacy

How to avoid buyer’s remorse when raising venture capital

Microsoft to pay off cloud industry group to end EU antitrust complaint

Capacity utilisation of Pakistan’s cement industry drops to lowest on record

SingTel annual profit more than halves on $2.3bn impairment charge

Welcome Back!

Retrieve your password

Add New Playlist