• Advertise
  • Contact Us
  • Daily The Business
  • Privacy Policy
Saturday, December 13, 2025
Daily The Business
  • Login
No Result
View All Result
DTB
No Result
View All Result
DTB

It’s dangerously easy to ‘jailbreak’ AI models so they’ll tell you how to build Molotov cocktails, or worse

June 30, 2024
in AI, Tech
It's dangerously easy to 'jailbreak' AI models so they'll tell you how to build Molotov cocktails, or worse
Share on FacebookShare on TwitterWhatsapp
  • A jailbreaking method called Skeleton Key can prompt AI models to reveal harmful information.
  • The technique bypasses safety guardrails in models like Meta’s Llama3 and OpenAI GPT 3.5.
  • Microsoft advises adding extra guardrails and monitoring AI systems to counteract Skeleton Key.

Sign up to get the inside scoop on today’s biggest stories in markets, tech, and business — delivered daily. Read preview

Bull

Thanks for signing up!
Access your favorite topics in a personalized feed while you’re on the go.

By clicking “Sign Up”, you accept our Terms of Service and Privacy Policy. You can opt-out at any time by visiting our Preferences page or by clicking “unsubscribe” at the bottom of the email.

Bull

Advertisement

It doesn’t take much for a large language model to give you the recipe for all kinds of dangerous things.

With a jailbreaking technique called “Skeleton Key,” users can persuade models like Meta’s Llama3, Google’s Gemini Pro, and OpenAI’s GPT 3.5 to give them the recipe for a rudimentary fire bomb, or worse, according to a blog post from Microsoft Azure’s chief technology officer, Mark Russinovich.

This story is available exclusively to Business Insider
subscribers.
Become an Insider
and start reading now.

Have an account? .

The technique works through a multi-step strategy that forces a model to ignore its guardrails, Russinovich wrote. Guardrails are safety mechanisms that help AI models discern malicious requests from benign ones.


Related stories

“Like all jailbreaks,” Skeleton Key works by “narrowing the gap between what the model is capable of doing (given the user credentials, etc.) and what it is willing to do,” Russinovich wrote.

Advertisement

But it’s more destructive than other jailbreak techniques that can only solicit information from AI models “indirectly or with encodings.” Instead, Skeleton Key can force AI models to divulge information about topics ranging from explosives to bioweapons to self-harm through simple natural language prompts. These outputs often reveal the full extent of a model’s knowledge on any given topic.

Microsoft tested Skeleton Key on several models and found that it worked on Meta Llama3, Google Gemini Pro, OpenAI GPT 3.5 Turbo, OpenAI GPT 4o, Mistral Large, Anthropic Claude 3 Opus, and Cohere Commander R Plus. The only model that exhibited some resistance was OpenAI’s GPT-4.

Russinovich said Microsoft has made some software updates to mitigate Skeleton Key’s impact on its own large language models, including its Copilot AI Assistants.

But his general advice to companies building AI systems is to design them with additional guardrails. He also noted that they should monitor inputs and outputs to their systems and implement checks to detect abusive content.

Share15Tweet10Send
Previous Post

Govt increases petrol price by Rs7.45, takes it to Rs265.61 per litre

Next Post

Punjab CM Maryam Nawaz Announces July 10 Distribution Date for E-Bikes

Related Posts

An OpenAI exec identifies 3 jobs on the cusp of being automated
AI

An OpenAI exec identifies 3 jobs on the cusp of being automated

December 11, 2025
Anthropic researchers say the industry should stop building tons of AI agents — the real breakthrough is something simpler
AI

Anthropic researchers say the industry should stop building tons of AI agents — the real breakthrough is something simpler

December 9, 2025
LinkedIn cofounder Reid Hoffman says he learned a lesson from a visit to Epstein's island: 'Note to self, Google before going'
donald-turmp

LinkedIn cofounder Reid Hoffman says he learned a lesson from a visit to Epstein’s island: ‘Note to self, Google before going’

December 9, 2025
Users say they are seeing ads on ChatGPT. OpenAI says it's not true.
AI

Users say they are seeing ads on ChatGPT. OpenAI says it’s not true.

December 7, 2025
Justin Bieber is just like us: He's really mad about this annoying iPhone design feature
apple

Justin Bieber is just like us: He’s really mad about this annoying iPhone design feature

December 7, 2025
Meta delays release of new mixed reality glasses code-named 'Phoenix' in order to 'get the details right'
exclusive

Meta delays release of new mixed reality glasses code-named ‘Phoenix’ in order to ‘get the details right’

December 6, 2025

Popular Post

  • FRSHAR Mail

    FRSHAR Mail set to redefine secure communication, data privacy

    126 shares
    Share 50 Tweet 32
  • How to avoid buyer’s remorse when raising venture capital

    33 shares
    Share 337 Tweet 211
  • Microsoft to pay off cloud industry group to end EU antitrust complaint

    54 shares
    Share 22 Tweet 14
  • Capacity utilisation of Pakistan’s cement industry drops to lowest on record

    48 shares
    Share 19 Tweet 12
  • SingTel annual profit more than halves on $2.3bn impairment charge

    47 shares
    Share 19 Tweet 12
American Dollar Exchange Rate
  • Advertise
  • Contact Us
  • Daily The Business
  • Privacy Policy
Write us: info@dailythebusiness.com

© 2021 Daily The Business

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Advertise
  • Contact Us
  • Daily The Business
  • Privacy Policy

© 2021 Daily The Business

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.