Skip to content Search

Generative AI for the Physical World

Principal Investigator: Longlai Qiu

Call: PRELUDIUM 23

Project Description: The advent of robotics promises to transform everyday life by automating mundane tasks, thus freeing humans to engage more deeply with their creative and intellectual pursuits. These systems are found in applications such as household robots, autonomous vehicles, and smart infrastructure. As advances in computing and manufacturing expand the capabilities of these systems, there is a growing need for design tools that can navigate the complex possibilities these advancements present. Current engineering approaches are time-intensive and heavily rely on human input for prototyping and iteration. This difficulty is further compounded by the vast robot design space that one needs to explore. Our research proposal addresses some of these challenges by focusing on the development of design tools for robots that integrate physical form and virtual cognition. In this work, we propose an approach that is data driven, generative and modular. This approach not only streamlines the design process but also enhances design efficiency which can potentially reduce the complexity and costs of robot production. AI-powered computational design tools. By ensuring accessibility for users with varying levels of expertise, AI-powered design tools will enhance human creativity and boost design performance. By leveraging tools such as LLMs, we can automate the generation of robot designs based on natural language commands, significantly speeding up exploration of the design space as shown in Figure 1. To ensure practical and implementable designs, we will employ prompt engineering techniques, including providing task descriptions and constraints depicted in graph structures. This approach will enhance the diversity and feasibility of generated designs, marking a significant departure from traditional manual design methods. Digital twin of the robot model. Design is iterative, with key bottlenecks being the number and duration of iterations. Digital twins, which are fast, accurate, and informative, can significantly accelerate this process. To bridge the gap between simulation and reality, we propose adapting simulators to new domains using sparse experimental data. Coarse simulators will provide physical priors refined by data-driven models, which, combined with differentiable simulators, will apply corrective forces to capture discrepancies. Co-optimization of brain and body. Developing robots that mirror the co-evolution of the brain and body, inspired by natural systems, is essential for effective interaction with complex environments. This approach will result in robots with human-like adaptability, capable of performing a wide range of tasks in dynamic settings. Robots collaborating with robots. Developing robots that can autonomously construct other robots in collaborative real-world settings will lead to multi-robot systems to perform complex tasks beyond the capabilities of individual robots. This involves using a bimanual manipulation system with two robot arms equipped with sensors for precise handling and collision avoidance. The system employs imitation learning, where robots learn the assembly process by observing expert demonstrations. Biomimetics and biological intelligence. Studying nature-inspired strategies will guide the design of robotic systems capable of autonomouslearning andadaptation in dynamic environments. Byincorporating biomimetic and biologically inspired methods, we aim to develop intelligent robots that can navigate and function in diverse, unpredictable contexts. Learning from evolutionary processes and biological principles will advance
intelligent behavior in robots, leading to more advanced and robust systems

Continual learning with conditional computation networks

Principal Investigator: Filip Szatkowski

Call: PRELUDIUM 23

Over the past decade, deep neural networks have seen significant advancements driven by increasing model size and training data. Research on scaling laws suggests that further progress will require even bigger models and more training data. However, training and hosting these large networks demand substantial computational resources, which makes it costly and restricts research on state-of-the-art models to a few best-funded companies. The environmental costs of large models are also quickly becoming concerning. Machine learning-related carbon emissions and energy consumption already account for around 1of the global total in both metrics, which is comparable to medium-sized countries. Daily water consumption for cooling servers for systems like Chat GPT is estimated at 0.5 to 2.5 megaliters. Therefore, resource-saving machine learning techniques for training and inference that effectively utilize the available data are crucial to ensure the sustainable growth of the field. Various techniques that leverage redundancy and over-parametrization of neural networks have emerged to reduce inference costs. Such techniques include quantization, pruning, knowledge distillation, and conditional computation networks. Conditional computation methods adapt the processing path in the model to the input data, intending to save computation on easier data samples by using a subset of the network. While such adaptivity is desirable, standard neural networks lack this ability and always perform the same operations regardless of input. Popular conditional computation solutions include Mixture-of-Experts, which scales up the capacity of the model by splitting it into expert modules, and early-exits, which add auxiliary classifiers to different layers of the network and allow it to dynamically skip unnecessary computation during the inference. Techniques focusing on training efficiency have been explored in parallel. Continual learning explores learning from incremental data streams, allowing the incorporation of new data into models without forget ting previously learned knowledge. This is especially important when retraining from scratch is costly or the original training data is no longer accessible. While most work on continual learning mentions efficiency as a desirable property of their methods, in practice they prioritize state-of-the-art results, disregarding the computational costs. We believe that computational efficiency is vital for continual learning in real-world scenarios, and continual learning methods should be designed to be efficient. We propose to leverage the adaptability and compute-saving capabilities of conditional computation methods in continual learning and use early-exit architectures to obtain novel, well-performing, and efficient solutions for continual learning. To understand the performance of early-exit networks in continual learning, we will investigate represen
tations in those networks and measure the specialization of the individual early-exit classifiers. We will tailor early-exit networks to continual learning scenarios through mitigating task-recency bias and customized knowledge distillation. Furthermore, we will explore architectural modifications and alternative exit mechanisms for the continual learning scenarios. We will combine our findings to obtain an effective and computationally light-weight continual learning method.

Overcoming forgetting in continual learning with an ensemble of experts

Principal Investigator: Grzegorz Rypeść

Call: PRELUDIUM 23

Continual learning and exemplar-free class incremental learning. Continual learning (CL) is an important field in machine learning aimed at developing systems capable of learning from a continuous data stream, similar to human learning. One critical aspect of CL is exemplar-free class incremental learning (EFCIL), which focuses on teaching models new concepts without retaining past data samples. This approach is especially vital in situations with privacy constraints or limited storage capacity. However, EFCIL faces a significant challenge known as catastrophic forget ting, where models forget previously learned information upon encountering new tasks.

Scientific goal of the project. This project aims to address catastrophic forgetting in EFCIL by developing an advanced ensemble method that leverages the strengths of multiple neural networks, referred to as experts. Similar to humans, experts can specialize in different tasks and cooperate to solve them better. Additionally, they can help each other to reduce forgetting or explain complex tasks, thus improving the learning process. Traditional single-model
approaches to EFCIL often suffer from semantic drift, where the model’s internal representation of past classes changes unfavorably during new learning tasks, leading to forgetting. In this project, we seek to utilize the knowledge of other experts to help currently trained expert alleviate its semantic drift and forgetting.

Research questions. Alleviating semantic drift: How can we mitigate semantic drift within an ensemble of experts? The hypothesis is that when training a single expert, leveraging knowledge from other experts in the ensemble can better maintain its integrity of past class representations. That will reduce the expert’s semantic drift and decrease its forgetting. Ensembled knowledge distillation: How can we improve knowledge distillation techniques for ensembles? Knowledge distillation is one of the best ways to reduce forgetting. The project proposes a novel method to distill knowledge across experts, ensuring that the model retains information about previous tasks more robustly, potentially outperforming existing logit and feature distillation techniques.

Significance and impact of the project. Current CL approaches often rely on storing past exemplars, creating challenges in memory usage, privacy, and scalability. This project aims to pioneer more efficient, scalable, and privacy-preserving incremental learning techniques by developing novel ensemble methods that eliminate the need for stored exemplars, potentially setting new benchmarks in the field. By integrating ensemble techniques
with CL, the project expects to enhance model robustness and performance in dynamic environments, crucial for applications like autonomous systems and real-time analytics. The findings will have significant implications for industries such as healthcare and finance, enabling continual learning without compromising data privacy. By addressing core challenges like semantic drift and catastrophic forgetting through innovative ensemble methods, the project aspires to significantly advance the state-of-the-art in continual learning. By reducing the need for data storage and ensuring compliance with privacy regulations, this project aligns with both practical and ethical considerations in modern AI applications, contributing valuable insights and inspiring further research in the academic community.

Beyond Worst-Case Analysis: Online Problems with Delays and Stochastic Arrival Times

Principal Investigator: Michał Pawłowski

Call: PRELUDIUM 23

In our project, we aim to enhance how online algorithms are evaluated. Traditional methods focus
on the worst-case scenarios and often fail to represent the typical, more predictable data behaviours
observed in the real world. Here, we intend to make our models more reflective of actual conditions.
For this reason, we assume that the data we work with follows some stochastic distribution (in other
words, is randomly generated). Our study focuses on deciding the best time to address service requests
that accumulate over time, weighing the costs of immediate versus delayed responses. We are devel
oping strategies for complex scenarios like online matching, facility location, and online service, for
which the worst-case analysis fails to provide reasonable performance guarantees. Below, we provide
straightforward explanations of the problems we are considering.

Online Matching To understand this problem better, imagine an online chess gaming platform that wants to maximize the overall satisfaction of the game for its users. Two main factors contribute to this — for each pair of players, we need to look at the experience difference (the smaller, the better), and at the waiting time to start a game (the shorter, the better). The problem here is that each of these parameters needs to be minimized at the expense of the other. For example, finding an opponent with a similar experience level results in a longer waiting time. Thus, the question arises of how to balance these costs when matching the players.

Facility Location The problem of online facility location with delay involves dynamically determining the most effective locations for deploying multiple temporary service units, such as food trucks, across a city. As requests for services arrive over time from various locations in the city, these units can be temporarily established at any chosen point to meet the needs. Each request carries with it a unique delay function, which provides a measure of urgency or priority based on how long the request has been pending.
The objective is to strategically decide which locations should be activated throughout the day to
efficiently manage the varying intensities and timings of requests. Opening a facility incurs a fixed
cost, and connecting a request to a facility costs an amount proportional to the distance between
the request’s location and the facility. The facilities themselves are momentary — they appear to meet the demand at a specific location and, once the needs are addressed, they disappear to relocate elsewhere, eliminating the need for a detailed schedule for each unit. This approach corresponds to a food truck leaving its current location after serving all customers there.

Online Service The problem of online service with delay can be seen as managing a mobile repairman who travels across the city to address incoming repair requests at various locations. Each request arrives over time with an associated delay function, which represents the urgency of the repair based on factors like the severity of the issue or how long it has been waiting to be addressed. The goal is to strategically determine the most efficient locations for the repairman to visit in real time, ensuring that urgent requests are prioritized and travel is minimized. The repairman appears at a designated spot to meet the specific demand, repairing everything from appliances to electrical issues, and once the work at that location is completed, he moves on to the next request. This strategy requires dynamic decision-making based on the evolving landscape of requests.

While the descriptions provided above offer a simplified overview of the challenges we plan to address, the actual problems we will work on are considerably more complex. We aim to develop deep insights and innovative solutions that will significantly contribute to the area of online algorithms with delays. We hope our research will provide a robust foundation for improving how these algorithms perform in real-world applications.

Reliable and efficient real-world test-time adaptation

Principal Investigator: Damian Sójka

Call: PRELUDIUM 23

Limitation of Deep Neural Networks. Over the last decade, deep neural networks have revolutionized various technical areas, finding applications in everything from depth estimation to text generation. Despite their advancements, they are not without flaws. One of the most significant drawbacks comes from poor performance on data that diverges from their training set- the test data is out-of-distribution. This issue is especially noticeable in computer vision, where the network trained on clean photos taken in daylight, performs poorly on corrupted or nighttime images. This limitation is less problematic in controlled environments, such as robotic applications in warehouses, where training data can be care fully curated. However, in open-world applications like autonomous driving, it is nearly impossible or prohibitively expensive to anticipate every possible data variety and prepare it for model training, given the ever-changing and unpredictable environments.

Test-Time Adaptation. Recently, the paradigm of Test-Time Adapta tion (TTA) has emerged as a rapidly growing research area. TTA aims to adapt a pre-trained neural network to the unlabeled data on the fly during test time. The goal is to prevent performance degradation by adjusting the network to continually changing data distributions without any previous assumptions about the test conditions. Current TTA methods are not without significant drawbacks. Firstly, it was recently shown that no existing TTA approach can handle all types of distribution shifts, making them unreliable for real-world deployment. Secondly, many of these methods are developed without considering computational efficiency, which is crucial for time-sensitive applications and resource-constrained platforms. This project aims to address these limitations by developing novel TTA methods that focus on both reliability and efficiency, and apply the findings to real-world computer vision tasks.

Anchoring Reliability and Stability. Real-world applications require machine learning systems to be reliable. However, it might still be difficult to trust a fixed ”black box” in safety-critical applications, let alone the neural network adapted on the fly. Therefore, we intend to focus on reliability across deployment scenarios. Our research will encompass a broad spectrum of testing conditions to identify trustworthy techniques that consistently perform well. We will draw inspiration from the fields of self-supervised and continual learning. Enhancing Efficiency In real-world applications, e.g., related to mobile robotics, models are deployed on edge devices with limited computational resources. The limitations include computational power, memory capacity, and energy consumption. The model inference itself for an edge device can be computationally demanding. On top of the inference, TTA aims to adapt the model creating a significant computational burden. We will analyze the efficiency of current TTA methods, which will let us understand the trade-off between their efficiency and adaptation performance. Given the insights, we will develop our techniques pushing the boundary of TTA efficiency further.

Applying the Findings to Real-world Tasks. TTA methods are frequently developed using an image classifi cation task as this is a simple setup allowing for easier development and testing. However, focusing solely on a single task limits the TTA applicability. Real-world problems faced by embodied AI agents during an open-world operation are far more sophisticated. Additionally, they do not work with random pictures, but sequences of perception data, which includes potentially useful information for adaptation, considering time-domain and geometrical constraints of surroundings. We plan to apply our findings on reliable and efficient TTA to tasks related to embodied AI agents, such as depth estimation, and leverage all available contextual clues for effective adaptation.

Expected Results and Implications. The outcomes of this project will advance the understanding of test-time adaptation and drive significant progress across a variety of open-world applications utilizing neural networks. Enabling a neural network to adapt reliably and efficiently will allow for more robust deep learning systems in many industries, ranging from robotics to voice assistants.



Other research groups and teams