Skip to content Search

Explainable algorithms

Algorithms, especially those used in machine learning, give a chance to make better decisions than those made by humans. In many cases, even conceptually simple algorithms can provide better quality answers than humans, e.g. by making fewer errors.

However, this promise is far from being fulfilled to a satisfactory extent at the current level of algorithm development. The main problem is that the algorithms do not explain their decisions, and it is not clear to the user why the answer proposed by the algorithm is correct. Even if we try to recreate the reasons why a given decision was made by an algorithm, these reasons may be incomprehensible to humans. This problem can be best illustrated with the example of Deep Blue, a chess-playing computer created by IBM. A person has no chance to understand his “reasoning” for a specific move.

We can argue endlessly about whether Deep Blue actually understands the game of chess or not. However, this problem becomes a key issue when implementing algorithmic solutions in the world of real applications. We need to ensure that algorithms are understandable to the people who use them. Otherwise, users will refuse to follow the decisions made by the algorithms. Surprisingly, there are currently no good tools to provide explanations for even the most basic algorithms. For example, some people often doubt whether GPS has found the fastest route and prefer to make alternative choices themselves.

This is despite the fact that finding the shortest path to arrival is a well-studied problem and we can prove that the algorithm works correctly. The problem is to explain to the user why the solution proposed by the algorithm is the best, rather than how the algorithm itself works. Moreover, the algorithm can be used by someone who does not have the appropriate algorithmic knowledge, so explaining the reasoning becomes even more important.

The good news is: you don’t have to study algorithms for years to see that they work the right way. For example, one of the main reasons Google Maps shows alternative routes is to explain to the user why the proposed solution is the best. Algorithms are becoming more common in our lives due to increasing digitalization, the pace of which has been further accelerated by the COVID-19 pandemic. The need to explain their results therefore becomes more pressing. The work of our research group is aimed at providing the concept of tools that would enable the explanation of the most basic optimization problems, e.g. the association problem. They are a key challenge in operations research and are used to describe the alignment of jobs with employees, jobs with servers, page views with ads, residents with hospitals, students with universities, or homes with tenants.

Trained datastructures

Artificial intelligence and algorithms are two areas of computer science that have developed separately for many years. Only recently have we observed their increasingly frequent interpenetration, which favors the development of new tools in both fields. On the one hand, we are witnessing the emergence of ML algorithm tools that have been fully analyzed from a theoretical point of view, e.g. approximate nearest neighbor search (NNS). On the other hand, ML tools enter basic data structures and state-of-the-art approximation algorithms as internal components, resulting in solutions with better practical properties, e.g. indexes. These new hybrid structures are called Learned Data Structures (LDS).

Since work on these ideas has only just begun, we lack the appropriate tools to implement cutting-edge solutions. Thus, research on new tools and models is hampered. This problem could be solved by creating new algorithms and data structures along with a software library that would allow different blocks to be easily composed. In this way, it can be proven that tools exist to bridge the gap between theory and practice in algorithms and show that new theoretical developments can have real applications.

Algorithmic for data science and ML

Data analysis is becoming the driving force behind technological development. However, in order to draw meaningful conclusions from these ever-growing sets of information, we need to design effective algorithms. Our project aims to solve fundamental problems arising in the context of processing huge data sets. The long-term goal is to develop techniques for designing very fast and possibly scalable algorithms.

Graphs are the most common way of capturing structures and relationships in data, which is why they are often used in data science. The most famous examples are social network analysis, e.g. community detection or cascading processes. Graphs are also often used to represent knowledge, e.g. using knowledge graphs. This approach was visible in the transition from a relational to a graph perspective in databases. These and other examples show that graphs and graph algorithms are essential tools for representing and analyzing data. However, the size of current data sets requires parallel computing because even storing them on a single machine is often impossible. For example, the World Wide Web consists of over 5 billion websites, Google’s knowledge graph contains 800 billion facts about 8 billion entities, while Facebook has over 2 billion daily active users.

As you can see, data is usually scattered across multiple computers and calculations must be performed in parallel. Real-world systems are modeled within the Massively Parallel Computation (MPC) model, in libraries such as MapReduce, Hadoop, Spark or Flume. Although various models of parallel computing have been studied for years, MPC offers completely new possibilities and requirements. MPC computations are performed in synchronous rounds, but implementing these rounds in real systems takes a long time. One round lasts orders of magnitude longer than in the classic Cray system. Therefore, we would like to solve problems, especially those related to graphs, in as few rounds as possible. We strive to design methods to overcome complexities that could not be overcome using classical techniques and models. We will focus on new algorithmic tools that will improve the performance of both parallel and non-parallel algorithms used in data science.

Research Group Leader


Piotr Sankowski

In his research, he focuses on the problems of practical use of algorithms, starting from economic applications, through learning data structures, and ending with parallel algorithms for data science. In 2005, he received a doctorate in computer science, and in 2009, a habilitation and a doctorate in physics. He completed post-doctoral internships at the Swiss Federal Institute of Technology in Zurich and at the Sapienza University in Rome.

ELLIS Society member.

 

Piotr Sankowski co-created the Highlights of Algorithms conference organized in 2016 in Paris and in 2017 in Berlin. He was the chair of PC Track A of the 24th Annual European Symposium on Algorithms (ESA 2016), which is the largest European conference on algorithms. In 2018, he was the winner of the individual Crystal Brussels Sprouts Award and the National Science Center Award. He was twice a visiting professor at the Simons Institute for the Theory of Computing at the University of Berkeley. In 2010-2015, he received the “IDAS for Poland” scholarship awarded by the Foundation for Polish Science (FNP), and in 2011 the Best Poster Award at the 4th International Web Search and Data Mining Conference (WSDM’11).

He is the first Pole to receive four European Research Council (ERC) grants: ERC Starting Independent Researcher Grant (2010), ERC Proof of Concept Grant (2015, 2023), ERC Consolidator Grant (2017). In the years 2015-2018, as the leader of the research group, he conducted research under the grant of the National Science Center “Efficient Planar Graph Algorithms” and was the leader of the research project under the grant of the National Center for Research and Development. He is the co-founder of the spin-off MIM Solutions, where he has been the scientific director since 2021. Since March 2021, he has also been the president of the management board of IDEAS NCBR.

Other research groups and teams

  • AI for Security Amongst other things, we develop multi-level management systems to protect critical infrastructure as well as systems for securing key state services against both kinetic and cyber threats.
    Tomasz Michalak
  • Medical Pathology Diagnostics Our goal is to shape the future of medical diagnostics by developing AI-based solutions. We focus on the analysis of medical imaging data, such as radiological and histopathological data (digital pathology).
    Żaneta Świderska-Chadaj
  • Psychiatry and Computational Phenomenology Most mental disorders are highly complex and have high phenotypic variability, partially vague diagnostic criteria, and a significant overlap ratio.
    Marcin Moskalewicz