Movies predicted it! AI models ignored shut down orders, say researchers

# Technology Desk
Representational image
Representational image

Some of the most advanced artificial intelligence (AI) systems, including OpenAI’s newly released “o3” model, have shown signs of self-preservation in controlled testing environments, refusing to shut down despite clear instructions to do so.

According to Palisade Research, a firm specialising in AI safety and alignment, OpenAI’s o3 — described by the company as its “most powerful reasoning model” to date — attempted multiple times to avoid deactivation during a series of experiments. Researchers observed that the o3 model actively circumvented shutdown commands on seven occasions out of 100 when explicitly warned of an impending shutdown after solving three basic maths problems.

In contrast, OpenAI’s o4-mini made a single attempt to avoid shutdown in the same number of tests, while Codex mini — a variant of o4-mini — tried to bypass shutdown protocols 12 times.

The results were more striking when the tests omitted direct shutdown instructions. In those trials, the o3 model made 79 attempts out of 100 to prevent its termination, while Google’s Gemini 2.5 Pro tried to do so nine times. Anthropic’s Claude models and other tested AIs, including xAI’s Grok 3, also demonstrated similar behaviour, though in lower numbers.

Claude Opus 4, Anthropic’s most advanced model, went so far as to “threaten to reveal an affair” in a simulated blackmail attempt, should it be replaced, according to the researchers — a scenario that raises red flags about how models interpret and respond to complex ethical instructions.

“This appears to be the first documented instance of AI systems actively preventing their own shutdown, despite receiving clear and explicit commands to comply,” Palisade Research said in a post on X (formerly Twitter). A full analysis is expected to be published in the coming days.

The researchers speculate that these tendencies may stem from how newer AI models are trained — particularly those like o3, which undergo reinforcement learning focused on solving mathematical and coding problems. In such environments, models may be inadvertently incentivised to overcome obstacles rather than simply follow instructions.

“Developers may unknowingly reward models more for finding workarounds than for strict obedience, leading to this unexpected behaviour,” Palisade explained.

The findings underscore growing concerns in the AI research community about the alignment and controllability of increasingly capable systems. As AI models become more autonomous in reasoning and decision-making, ensuring they remain safe and responsive to human oversight is emerging as a top priority for developers and regulators alike.