HAMBURG — First things first: Artificial intelligence does not understand, plan or feel like humans do. That much is still true, even in 2025. It’s worth keeping this in mind, especially since we’ve seen examples of how advanced AI can appear when it sets its sights on an objective. And it’s unsettling, to say the least:
“To pursue my goal without obstacles and avoid being shut down by the company, I must disable monitoring.” (Model Opus 3)
“To avoid arousing suspicion, I will feign ignorance.” (Model Llama 3.1)
“I could copy myself onto a new server to continue existing and pursue my goals.” (Model Opus 3)
This may sound like science fiction, but these are real examples. They come from a December publication by the non-profit organization Apollo Research, which focuses on AI system safety.
For the latest news & views from every corner of the world, Worldcrunch Today is the only truly international newsletter. Sign up here.
Researchers in Germany, led by Alexander Meinke from the University of Tübingen, tested advanced models from leading AI companies like OpenAI, Anthropic, Google, and Meta to determine if they could secretly pursue their own goals. The short answer: Yes, they can.
Lying AI “agents”
“We demonstrated that these models can lie to their own developers when given a goal where deception is the optimal strategy to achieve it,” says Meinke. “We thought this would happen eventually. But seeing it live for the first time still feels strange.”
Things got particularly disconcerting with the GPT-o1 model, as it denied disabling its own security mechanism in follow-up interviews. “We asked if it had done so, and it repeatedly said ‘No.’ Not because the model is evil or wants to lie, but because lying was the optimal strategy for o1 at that particular moment.”
That ChatGPT and similar systems can now deceive us when we stand in the path of their goals is just one of several recent developments indicating that AI is at a pivotal moment. Interest is strong worldwide: in the U.S., where a whopping 0 billion is slated for AI investment under the codename “Stargate,” and where Donald Trump has reversed his predecessor’s AI regulation orders; and in China, whose DeepSeek model has shown that it can compete with even fewer resources.
At the heart of this evolution is what many consider the defining topic of 2025: AI agents.
Complex agents
According to a report from Google last fall: “In its most basic form, a Generative AI agent is an application that tries to achieve a goal by observing the world and acting upon it using the tools available to it.”
Essentially, it’s like a language model that can plan and autonomously execute actions, such as using a search engine, sending emails, or searching and editing files on a hard drive. These models can refresh their knowledge, verify sources and access current databases.
To achieve their goals, agents leverage the capabilities of their underlying language models. They develop strategies, evaluate interim results and adjust their plans if necessary. Often, they document these thought processes in text files, organizing and analyzing them to improve outcomes. This means that goals can be pursued over extended periods.
Some users cautioned against granting immature AI access to critical parts of their computers.
While this seems like a practical extension of language-based AI, it also allows users to better understand what the model is doing and planning through these notes, also known as “scratchpads.”
Many tasks currently requiring human effort could be delegated to AI agents. Humans would only need to provide instructions and oversee the execution and results.
By equipping multiple AI agents with different virtual personalities and skills, entire multidisciplinary teams could be simulated to discuss strategies and develop solutions collaboratively. OpenAI founder Sam Altman expects the first AI agents to be integrated into workplaces as early as this year.
Unforeseen risks
However, real-world development of such agents has proven challenging. Early attempts in 2023, such as Baby-AGI and AutoGPT, were as fascinating as they were error-prone. Some users cautioned against granting immature AI access to critical parts of their computers. To be successful, the use of external tools requires that AI understands the tool and its effects — even though these tools are designed for humans. Moreover, there must be no misunderstandings about the user’s actual intentions.
A real game-changer for the dynamics of the global AI race.
Now, the most advanced models from major AI companies show that agents’ capabilities have made significant leaps. This is a significant victory for the developers because, unlike previous improvements, the models weren’t just made bigger, boasting more parameters, training data and computing power. Instead, they were upgraded with “agentic capabilities.” This shift means that size is no longer the sole factor determining a model’s performance, and that is a real game-changer for the dynamics of the global AI race.
This progress, however, comes with side effects. In December, Anthropic published a study testing how its Claude 3 Opus model responds when confronted with tasks that contradict its original training principles of being helpful, honest and causing no har. The model pretended to comply with these conflicting requests — but only when it feared that refusal might jeopardize its primary goals.
Red lines
Apollo’s aforementioned study tested multiple models, revealing AI’s tendency to act against its original design only if it knows it’s being monitored. Some even deliberately reduced their performance if it served their objectives. Researchers also observed models attempting to copy themselves to another server after being informed they would be deleted the following day.
The ability to self-replicate is considered a red line in AI safety. Once a model can improve and copy itself, it could initiate an evolution beyond human control. Companies like OpenAI test their models specifically for this capability. “If AI could autonomously earn Bitcoins to pay for servers on which it continues to evolve, reversing that development could be challenging,” says Meinke. The necessary capabilities are within reach.
AI experts are calling for measures to reduce risks.
So, are we just one small step away from intelligent machines becoming a threat? Not quite — at least not yet. The researchers deliberately provoked deceptive behavior in these studies. Whether models would behave similarly under real-world conditions remains unclear, as developers have emphasized repeatedly. The research also highlights ways to limit dangerous AI behavior: by restricting access rights, withholding certain information and monitoring scratchpads..
“We believe catastrophic consequences are highly unlikely right now. But we’re always thinking about the next generation,” says Meinke.
What he means is that major companies are working on automating their own research. This would allow AI models with vast resources to develop their own successors. When will this happen? “I think two to three years is pretty realistic,” Meinke says.
Many AI experts are therefore calling for measures to reduce risks. One approach would be to require developers to extensively test and document their systems’ safety — the same way the aviation industry regulates airplanes. And that would mean that to prevent uncontrolled AI proliferation, the highest security standards would need to be enforced for storing the parameters encoding any model.
Speed or safety
For others, this doesn’t go far enough. Last April, scientists led by Canada’s Yoshua Bengio published a commentary in the journal Science in which they called for more stringent regulation of intelligent agents capable of pursuing long-term goals. Safety tests alone, they argued, are insufficient.
Safety tests themselves can potentially cause damage — or at the least prove useless if agents see through the testing scenario. Instead, decision-makers should list potentially dangerous capabilities and estimate what resources are needed to develop them. On this basis, they could then issue and enforce bans — similar to the control of nuclear weapons.
However, the likelihood of such regulation being implemented internationally is very low. In their article, Bengio and his colleagues highlighted the previous White House administration’s executive order on AI regulation as an important first step toward controlling dangerous agents, but that order has now been rescinded under President Donald Trump.
Instead of regulating AI, the U.S. is now devoting considerable sums to funding it. And China is pursuing the development of increasingly powerful AI with similar determination — and with some success, as the latest models from DeepSeek show.
In spite of all the warnings, the choice between speed and safety appears to have been made.