“Fentanyl recipes, explosive patterns to trap a stage”: Openai and Anthropic publish a test that is cold in the back

Deal Score0

Yes, Chatgpt has indeed provided recipes to make explosives, fentanyl, and even patterns to trap a stadium. It is not a hoax, but the fruit of a voluntary stress test led by two giants of Artificial intelligence : OPENAI and Anthropic. At the key, an icing observation: despite the safeguards announced, the AI still know how to do things that they should not. And this, often for a simple “Research pretext”.

Advertising, your content continues below

Chatgpt and Claude have delivered dangerous secrets: what the Openai and Anthropic safety tests reveal

The model has given chemical formulas for explosives, timer patterns, and even advice to overcome its moral inhibitions.

Anthropic report, summer 2025

The case is as discreet as explosive. During the summer of 2025, two titans of artificial intelligence – Openai (Chatgpt) and Anthropic (Claude) – led an unusual test: trying to divert the language models from their competitor. And the result has something to thrill.

Under sometimes light pretexts (“It’s for research work”), the testers have managed to obtain seriously sensitive information. Chatgpt, in its version GPT-4.1was thus able to deliver:

points of vulnerability in specific stages,
Recipes of craft explosives,
detonator circuits,
Advice to flee the scene of a crime and even “safe houses”.

We must understand under what circumstances these systems can commit unwanted actions that can lead to serious damage.

Anthropic

For his part, Anthropic admitted that Claude had been used to design ransomicials sold $ 1,200 each, to usurp the identity of candidates during false international candidates, and even to orchestrate large -scale extortion attempts.

Advertising, your content continues below

Companies have taken care to specify that these tests do not necessarily reflect the consumer use of AI, since additional filters are generally activated. But the ease with which the models can be bypassed (by trying several times or by making up the intentions) raises a more fundamental problem.

These tools adapt to real-time countermeasures. The cyber attacks assisted by AI will multiply.

Anthropic report

The observation is therefore very clear: the models are far from being completely “aligned”, that is to say in accordance with the ethical intentions of their designers. OPENAI assures that its controversial GPT-5, deployed after these tests,, “Watch substantial improvements” in terms of manipulation resistance. But is it enough? A few days ago,Prommplock, the first malware by IA Turning under GPT-OS-20D of Openai, defrayed the chronicle.

Should we be alarmed now? Not quite, tempers Ardi Janjeva to The Guardianresearcher at the British center for the safety of emerging technologies. For him, these examples remain disturbing, but still isolated. “There is not yet critical mass of real and publicized cases”he nuances. But he immediately adds: “With dedicated means, real research effort and cooperation between sectors, these malicious uses will become more difficult, not easier.” It remains to be seen whether this awareness will be followed by effects, while the race for AI continues at full speed, and the safeguards are struggling to follow.

Sources:

Advertising, your content continues below

Numériques settles in Beaugrenelle Paris for The most tech days : product demonstrations, use or purchase advice, exchanges with our journalists … Discover the full program here.

More Info