Topics for theses - Cyberagentur

Topic 1: Low Shot / Zero Shot Classification Techniques

Low Shot / Zero Shot Classification Techniques (flexible and resource-efficient models) (classification of samples with few or no data examples)

High-quality data forms the basis for efficient machine learning algorithms. However, such data is not always available in the required quantity. In addition, obtaining the relevant data can be a very time-consuming and resource-intensive process. In order to develop flexible and resource-efficient machine learning models, there are learning methods that make it possible to achieve promising results with just a few data examples (e.g. images, videos, text modules). Such learning methods are also known as low-shot methods and are characterized by a data-saving training process.

The aim of a master’s thesis is to investigate the extent to which low-shot learning methods (few-shot, zero-shot and/or in-context learning) can be used for classification tasks in safety-relevant contexts. To this end, the current state of research in this area will be examined and described, various low-shot classification techniques will be compared and a prototype will be developed that can be used as a demonstrator for classification tasks.

The basis of the applied part of the master’s thesis is a data set that includes image data of different classes of military vehicles and can be used for an exemplary implementation of the methods developed.

Topic 2: Explainable AI in the context of image classification

Explainable AI in the context of image classification

Artificial intelligence processes, especially deep learning, are constantly evolving and leading to increasingly complex architectures and learning processes. In safety-critical areas, however, the focus is increasingly on the traceability of AI-based procedures, which can have a significant impact on downstream decision-making processes. In order to make the outputs of complex deep learning processes more transparent and comprehensible, explainability and interpretability methods exist that examine the underlying decision patterns of an algorithm and make relevant classification features visible.

As part of a master’s thesis, various methods of Explainable AI (XAI) are to be identified, examined and compared with each other in terms of their relevance in the security sector. The resulting findings will then be transferred into a functional prototype that can be integrated into a classification process as an explainability component.

Topic 3: Label efficient approaches for classification tasks

Label efficient approaches for classification tasks (e.g. Semi-Supervised Learning vs Self-Supervised Learning)

High data availability plays a crucial role in the development of machine learning models. The better this data is described and prepared, the more likely it is to obtain efficient and accurate models. In particular, data labeling, i.e. the assignment of data to a class, is a time-consuming and resource-intensive process that is used especially in supervised learning environments. In order to optimize such learning processes, label-efficient methods exist that minimize the labeling process and still enable the development of machine learning models with high accuracy. Examples of these label-efficient methods are semi-supervised learning (e.g. pseudo-labeling) or self-supervised learning (e.g. SimCLR).

The aim of a master’s thesis is to investigate the extent to which such label-efficient learning methods are suitable for application to a specific image classification problem. To this end, SOTA technologies are to be identified, compared and transferred into a functional prototype for image classification.

Topic 4: GenAI approaches for the generative generation of image material

GenAI approaches for the generative generation of image material

Generative Artificial Intelligence (GenAI) encompasses a range of generative techniques that can be used to generate data (e.g. text, images, video). Prominent representatives of GenAI are ChatGPT (Large Language Models) and Stable Diffusion (image generation).

In the context of computer vision tasks (e.g. classification, object recognition), the focus is often on the availability of extensive image material. However, there are scenarios in which this material is not available to the necessary extent (e.g. due to terrain that is difficult to access). Using generative technologies, it is possible to extend real data with synthetically generated data that is intended to increase the scope and variability of a data set in order to train classification and detection algorithms.

Generative approaches to image generation are to be investigated as part of a master’s thesis. The aim is to use a practical application scenario to identify and apply GenAI techniques that can be used to optimize computer vision tasks, especially in environments with low data availability.

Topic 5: Open Source AI Landscape – Usability of data and models in the security context

Open Source AI Landscape – usability of data and models in the security context

In the context of machine learning, the free availability of data sets and models plays a decisive role in the commissioning and dissemination of AI-based functionalities. Against the background of this availability problem, new open source initiatives are constantly emerging in the field of artificial intelligence, in the sense of networks or platforms (e.g. Hugging Face or LAION), which set themselves the task of making data sets, source code and parameters available for use and modification.

The aim of the master’s thesis is to identify existing open source initiatives in the field of artificial intelligence and to classify them in a specially developed “Open Source AI Landscape”. Such a landscape should serve to classify and characterize the content of freely available AI modules. In addition, a procedure is to be developed with the help of which open source data as well as open source models and associated frameworks (broken down by modalities, license conditions, etc.) can be brought together and assigned to possible use cases from the cyber security context. The aim is to provide a guideline that simplifies the identification and combination of open source modules for AI-based developments and shortens the data curation and model development times associated with the development of ML solutions.

One result of the work could be the provision of a digital “Open Source AI Landscape”, which uses interactive modules (e.g. keyword search, filters, etc.) to identify freely available data sets and AI modules for a defined use case or specified framework conditions.

Topic 6: Embodied AI – state-of-the-art and development potential

Embodied AI – state-of-the-art and development potential

Embodied AI refers to a form of physically anchored artificial intelligence that learns and acts through sensors, actuators and interactions with the physical environment. While great progress has been made in traditional AI through data-driven models such as Transformers, Embodied AI faces the challenge of combining cognitive abilities with physical perception and planning actions. This opens up new possibilities for robotics, autonomous systems and human-centered applications.

The aim of this master’s thesis is to comprehensively analyze the current state of research in the field of embodied AI and to identify existing approaches, technologies and their applications. One focus will be on the integration of generative methods (e.g. large language models, large action models) and physically-informed models, which can potentially increase the self-perception and spatial perception of embodied systems. Through the use of embodied AI in versatile environments, approaches to the adaptability and learning ability of the systems in unknown environments will also be considered, as well as methods of near-real-time integration of multi-modal content (e.g. video sequences, audio, text) into the learning and execution processes. In addition, development potential and future research directions are to be derived specifically with regard to the security sector.

Topic 7: AI-Native Computing – Hardware optimized for AI and AI-controlled / AI-generated software and systems

AI-Native Computing – Hardware optimized for AI and AI-controlled / AI-generated software and systems

AI-native computing refers to a paradigm that relies on the deep integration of AI methods into all layers of IT architecture, opening up new possibilities for adaptive, autonomous and self-optimizing computing. This can create new standards in areas such as edge computing, autonomous systems and personalized digital environments. These technologies also include AI operating systems, in which artificial intelligence acts as a central control system for software, hardware and applications and provides users with applications and interfaces tailored to their needs. The first well-known representatives of such technologies are, for example, the “Computer Use” tool developed by the company Anthropic, which provides an AI model to interact with a desktop environment in a human-like manner, or AI agents for software development such as Copilot, which are intended to lay the foundation for native AI functionalities in computer architectures.

The aim of this master’s thesis is to analyze the current state of research in the field of AI-native computing and to identify development potential and future fields of application. Different topics can be focused on, such as AI-controlled operating system architectures, agentic AI systems or autonomous software ecosystems. The results of the scientific work can be overviews and evaluations of existing systems, the derivation of recommendations for action for future AI research or the production of small prototypes for demonstration purposes.