Apertus: Open Source AI which relies on transparency and sovereignty of data

Deal Score0

Apertus was designed by the Federal Polytechnic School of Lausanne (EPFL), ETH Zurich and the Swiss Center for Scientific Calculation (CSCS), public institutions that envisage AI as a public infrastructure. Their goal is to make it as open as possible. This is the origin of his name, Apertus meaning “open” in Latin.

Advertising, your content continues below

An AI that respects the regulations

Therefore, it does not intend to compete directly Chatgpt. On the contrary, he wishes to offer a slightly less advanced alternative, but more reliable, more transparent and above all more accessible. It would be equivalent to the Meta Llama 3 model, which dates from last year. The researchers indicated that the model has been trained on 15,000 billion tokens from more than 1,000 languages, including 40 % non -English -speaking sources.

Apertus is designed for the common good. It is one of the rare LLMs of this scale to be completely open source, and it is the first to integrate, from its conception, fundamental principles such as multilingualism, transparency and regulatory compliance.

Imanol Schlag, technical manager of the LLM project and senior researcher at ETH Zurich

Transparency at the heart of its conception

The researchers used only publicly available data on the Internet, and have respected sites that prohibit the use of their content. The project takes into account data protection laws, as well as the European Union AI Act. Scientists have also taken care to delete all personal data as well as unwanted content before training.

The opening of Apertus goes much further than most so -called “open source” models, which very often only share weights. The researchers have published the weights with intermediate checkpoints, but also the source code of the training process as well as the documentation. Everything is available under Apache 2.0 license on Hugging Face. It is offered in two sizes: 8 billion and 70 billion parameters.

Keep control over your data

Even if the open source models are often a little less efficient than Chatgpt, they still have a major advantage. Operating locally, these models can be adapted to needs, and especially the data never leave the server on which they turn. In other words, this kind of model can be adopted by organizations requiring absolute security of data, such as banks.

Researchers are already working on the next, more effective models, which may be specialized in certain areas such as law, health or education.

Advertising, your content continues below

Numériques settles in Beaugrenelle Paris for The most tech days : product demonstrations, use or purchase advice, exchanges with our journalists … Discover the full program here.

More Info