
After Chatgpt, Deepseek attacks Dall-E 3 and the generation of images with Janus-Pro-7b
For those who have not too followed the news of the AI in recent weeks, know that Deepseek is a Chinese startup that caused some tumult on the side of Silicon Valley and Wall Street. How ? By offering V3 and R1, high -performance language models (LLM), which cost above all to develop and operate than those of American competitors. The latter take Deepseek very seriously, as evidenced by The “war rooms” set up by Meta.
Deepseek, the Chinese response to American AI.
The Chinese startup obviously does not intend to stop there, since it has just unveiled a new family of multimodal models called Janus Pro. Among them, the Janus-Pro-7b is distinguished by its promising performances on paper, even surpassing Dall-E 3 of Openai in several benchmarks. This Janus-Pro-7b is the largest model in the Janus Pro range, which includes models ranging from 1 to 7 billion parameters. These parameters, which represent the problem solving capacities of a model, are often correlated with overall performance. Although relatively compact compared to other models like GPT-4 or Dall-E 3, the Janus-Pro-7b offers interesting results, but which do not overthrow the table. We will come back to this later.
Promising theoretical performances
According to Deepseek, this model is based on a unified autoregressive architecture. This means that he is able to both analyze images and generate new ones. Unlike certain models specializing in one task, the Janus-Pro-7B combines in flexibility and efficiency theory.
The Janus-Pro-7b was evaluated on several recognized benchmarks in the AI industry. On Geneval, a tool that measures the ability of models to follow complex instructions to generate images, the Deepseek model obtains a higher score than that of Dall-E 3. On Dpg-Bench, it again exceeds competitors as stable Diffusion XL and Pixart-Alpha in terms of detailed and coherent images generation. These results are all the more amazing since some of the competing models are older or require more resources to achieve similar performance. Of course, these are the results obtained by Deepseek and it will be necessary to see how the model behaves in real use.
The Hugging Face Janus-Pro-7b interface. © Numériques
Open source
Like V3 and R1 language models, Janus-Pro-7b is also available under open source license. This allows companies and developers from around the world to use this model freely for commercial or personal applications. All Janus Pro models can also be downloaded from the Hugging Face platform, which will certainly facilitate their adoption. This strategy is identical to that of Meta with its Llama models, but is the opposite of that of Openai, whose models like Dall-E 3 remain owners and require paid access via APIs.
And in practice?
To take advantage of Janus-Pro-7b without restriction, it must be installed locally on a computer. But you can also just try it on the dedicated Hugging Face demo page. What we did. And as much to tell you that, for the moment, we have not really been thrilled by the results. Our trials do not have a test value, insofar as we simply launched some generations to test the interface and have a vague idea of the quality of generations. If the instructions are generally well followed, the images generally lack details. We have repeatedly submitted the same prompt (request) to Dall-E 3 (via Chatgpt) and to Janus-Pro-7b. And the results were invariably more convincing with the Openai tool.
On the left, the image generated by Janus-Pro-7b, on the right, that of Dall-E 3. © Numériques
And for fun, here is what the same prompt gives with Midjourney. The image is more beautiful, but the helmet for the dog was visibly considered superfluous by the AI …
The same prompt, subject to Midjourney this time. © Image generated by AI