
The AI steals your data and this company put an end to this practice with a trap
In the past, sites used simple protocols like robots.txt to define what indexing robots are allowed to use. Except that here, the AI flout this social contract, hence the cloudflare reaction with a strategy that allows them to prevent your data.
False pages to trap AI
In a blog post, Cloudflare explains how the company “Traps mal -intentioned robots in an AI labyrinth”. The principle is simple but very effective: robots that break the rules are trapped in a network of dummy pages, losing time and resources.
“The content generated by AI has exploded … At the same time, we have also noted a proliferation of new indexing robots used by AI companies to extract data intended for training their models”explains Cloudflare in his article. “IA robots generate more than 50 billion requests to the Cloudflare network every day, just under 1% of all the web requests we observe.”
Before, Cloudflare blocked IA robots but this technique allowed IA companies to modify their strategies to continue to extract data. Hence the implementation of these false web pages filled with content generated by AI, abyss.
The situation is therefore ironic but also strategic: when an AI trains on content generated by another AI, the quality of the model deteriorates. The phenomenon is known and is called “Draft of the model” : A formidable punishment for IA companies.
Obviously, human internet users will never come across these trapped pages. The AIs find themselves in this bottomless burrow and consume precious resources. It is now possible for Cloudflare customers to activate this protection to protect themselves against unauthorized extraction AI.