Europe's path to a more transparent application of foundation models

Four teams develop AI benchmarks and models for safety-critical applications

Dr. Daniel Gille, Head of Artificial Intelligence at the Cyberagentur, emphasizes the importance of transparent and resilient evaluation mechanisms for multimodal foundation models as part of the new HEGEMON research program.

HEGEMON marks the start of a research competition that is unique in Europe: four teams will compete against each other to adapt generative foundation models for security-critical contexts in a systematic, neutral and comprehensible way for the first time. The focus is on challenging tasks from the field of geoinformation – and the question of how and which internationally pre-trained models can be reliably used by German security authorities. The Agentur für Innovation in der Cybersicherheit GmbH (Cyberagentur) deliberately focuses on benchmarking, transparency and strict evaluation cycles.

The Cyberagentur has launched HEGEMON – a new three-year research program to develop holistic benchmarks and tailor-made AI models for safety-critical applications. The call for proposals was opened in June 2025. Following an intensive evaluation process, which also involved experts from the Bundeswehr, ZITiS and BSI, the four contractors who will take part in the competition have now been selected.

The German Research Center for Artificial Intelligence (DFKI) in cooperation with GAF AG, dida Datenschmiede GmbH, the Fraunhofer Institute for Integrated Circuits (IIS) and the Institute for Applied Computer Science (InfAI) e. V. were selected. They all address one of the central challenges of the current AI landscape: the lack of opportunity to evaluate internationally developed foundation models – predominantly from the USA or China – systematically, comparatively and reliably in a safety-critical European context.

This is exactly where HEGEMON comes in. In a competition format “everyone against everyone”, the models of the four teams are adapted for complex tasks from the geoinformation sector. These include

the creation of comprehensible text summaries on country-specific topics,
the conversion of remote sensing data into vector data,
and a map chatbot with intelligent text output based on maps (e.g. “Are there medical facilities on this map? Please share the coordinates if they exist.”).

In addition to the development of the models, domain-specific, holistic benchmark sets – consisting of tasks, metrics and test data sets – will be created to enable users to transparently and comprehensibly evaluate the performance of AI systems in the future.

A core feature of the program is the neutral test environment: all participants submit their models and benchmarks to a separate platform, which is operated in cooperation with GISA GmbH. There, the inference takes place under identical conditions and the results are transparently summarized on a leaderboard.

The competition is divided into several evaluation phases: The first major test takes place after nine months. Based on the results, the research enters its next phase, followed by a second interaction period after 20 months and the final test after 36 months.

Dr Daniel Gille, Head of Artificial Intelligence at the Cyberagentur and head of the program, emphasizes the added value of this approach: “We hope that the mutual competition and regular interactions will not only lead to more powerful models, but also to much more meaningful benchmarks. After all, in order to stay ahead, all participants must continuously learn from each other and develop their approaches further. This ensures dynamism, quality and real progress.”

He particularly emphasizes the role of the users: “The assessments of BSI, ZITiS and the Bundeswehr are highly application-relevant, very well-founded and significantly expand our professional bandwidth once again. This ensures that our programs are heading towards highly relevant and usable results right from the start.”

With HEGEMON, the Cyberagentur is creating a new standard for the evaluation of safety-critical AI in Germany and Europe. The competition lays the foundation for robust, transparent and comparable testing procedures – a decisive step towards trustworthy foundation models in the security sector.

Further information and registration:

https://www.cyberagentur.de/programme/hegemon/

Europe’s path to a more transparent application of foundation models

Four teams develop AI benchmarks and models for safety-critical applications

Newsletter