Continual Learning – what is it?
Artificial neural networks are powerful models used in various fields because they can learn to represent useful features. When we train neural networks, we usually start with random settings and adjust them based on the data available at that moment. However, this approach is different from how humans learn. Humans continuously build on their knowledge over time; they are lifelong learners.
The problem described above arises when we try to train neural networks in a similar way, using non-stationary data that changes over time. This leads to a phenomenon called “catastrophic forgetting,” where the network forgets previously learned information when presented with new data.
Continual Learning (CL) aims to solve this issue by finding ways for neural networks to learn incrementally without forgetting what they previously learned.
Those who cannot remember the past are condemned to repeat it. – George Santayana
Changes are hard….. also for artificial neural networks
All deep learning models suffer from catastrophic forgetting. The widespread use of neural networks in various applications, such as visual search engines, localization services, and medical diagnostics, means that many models become outdated within days or months of deployment due to catastrophic forgetting. This problem occurs when neural networks forget what they have learned when faced with new data.
One of the simplest solutions to this issue is retraining the neural network from scratch using updated datasets. However, this approach is like starting from day zero every time, from tabula rasa for neural network. Continual Learning addresses this problem by allowing neural networks not only to finetune to new data and forget previous one (what is mainly the domain of transfer learning) but train the model continuously from a stream of ever-changing data.
Eco friendly or up-to-date – that is a question
Continual learning is an energy efficient approach. The current approach to deep learning is neither sustainable nor green. According to the neural scaling law, making models larger or acquiring more data surely can bring progress. But this is not sustainable. Only a few organizations can afford to do this, such as OpenAI with the release of ChatGPT, but the energy required to train such a model is still prohibitive. Continuing this trend without incorporating incremental model adaptation is wasteful and not sustainable.
How most of the models are built has nothing in common with being energy efficient. Increasing demand for AI/ML solutions will only make the situation worse from the energy expenditure perspective. Continual learning methods by design are prepared not to waste what was learned and to adapt fast to new conditions, being more energy efficient. That is also a reason it is also one of the main pillars in Zero-waste Machine Learning in Computer Vision Group.
Real-world settings require Continual Learning
In real-world settings, the rapid adaptation and continual learning of neural network models hold significant practical importance for several reasons. Firstly, in numerous real-life applications, there are constraints on energy or computational resources available for model retraining. This makes it impractical to retrain the model on small devices with limited battery power, especially for equipment like drones, small autonomous robots, or edge devices. Waiting for the results after retraining becomes unfeasible in such scenarios.
Imagine a drone delivering a life-support system in a forest, where the weather conditions abruptly change due to an unexpected storm, which was never present in the training data. In such situations, there is no possibility to go back and re-train the model using new data. The model needs to adapt quickly and complete its mission despite the challenging conditions.
Secondly, the open-world environment is subject to constant changes. These changes can range from simple shifts in weather conditions leading to domain data changes to complex legislative alterations in driving laws, resulting in concept drift or the emergence of new categories for self-driving cars. In this context, detecting novel elements and ensuring generalization to new categories becomes vital to avoid costly errors.
Fast adaptation and continual learning of neural network models are imperative in an open-world context where computational resources are limited, and environmental changes are inevitable. Neglecting these aspects may lead to expensive errors with potentially severe repercussions.
Progress of Artificial Intelligence cannot be made without understanding Continual Learning
Continual Learning is considered a fundamental aspect of artificial intelligence. Understanding the process of machine learning ability to learn and adapt to new information without forgetting previously learned samples is kind of the holy grail for intelligent autonomous agents, that can first try to mimic humans’ life-long learning skills and can learn and adapt in a flexible and dynamic way.
Current research directions and projects
- Continual Representation Learning
- Generative models in Continual Learning
- Architecture-based methods and ensemble models for CL
- Open-world problems and Generalized Category Discovery
- Test-time adaptation and domain generalization
- Data perspective on forgetting and stability
- Bayesian approaches for Continual Learning