04

Research

Intelligent Algorithms and Learned Data Structures

Piotr Sankowski

LEADER OF THE RESEARCH GROUP

Polish computer scientist, habilitated doctor of mathematical sciences. Associate professor at the Institute of Computer Science, Faculty of Mathematics, Computer Science and Mechanics, University of Warsaw, Member of Ellis Society.

Explainable Algorithmic Tools

Algorithms, especially the ones used in machine learning, promise to aid people in making decisions. In many scenarios even conceptually simple algorithms can deliver better quality answers than humans, e.g., make less mistakes. Still this promise is far from being realized in its full potential. The main problem is that algorithms are not able to give much insight nor explanation why the returned answer is the correct one. Even if we try to reverse-engineer the reasons why this decision was made by an algorithm, these reasons may still remain incomprehensible to humans. This problem can be best illustrated by the example of Deep Blue, a chess computer developed by IBM. A human has no chance to understand its “reasoning” for a particular move.

In case of games, we might continue the philosophical dispute forever whether Deep Blue understands chess or not. However, this becomes a cornerstone issue when deploying algorithmic solutions in real-world. We need to guarantee that algorithms can be understood by people using them, as otherwise they refuse to follow them. Surprisingly, there exist no good tools to deliver explanations even for the most basic algorithms. For example, some of us often doubt whether GPS found the fastest route, and might be trying alternative ones. This despite the fact that finding the shortest path is a well-studied problem and we can even prove that the algorithm works correctly. The problem is rather about explaining the solution to the user, and not the algorithm itself. Moreover, the algorithm might be used by someone that does not have the right algorithmic background, so explaining its decision becomes even more important. The good news is that one does not need to study algorithms for years to see that they work in the right way. For example, one of the main reasons why Google Maps shows alternative routes, is to explain to the user why the returned solution is the best one. This is because the possible alternatives are worse!

Algorithms are becoming more prevalent in our lives due to ongoing digitalization which is further sped-up by the current pandemic situation. Hence, the need to explain their results becomes more immanent. This research aims to deliver a proof of concept of the tools that would provide explanations for the most basic optimization problems, e.g., assignment problem. Assignments are the cornerstone problem in operational research and are used to describe matching of jobs to workers, jobs to servers, page views to ads, residents to hospitals, students to universities or houses to tenants. Furthermore, all decisions taken by algorithms should be explainable as required by “Ethics guidelines for trustworthy AI’’ that apply to complex ML solutions, but to this case as well.

Learned Data-Structures

Artificial intelligence and algorithms are two areas of computer science that have been developed in separation for many years. Only very recently we have been witnessing more and more interactions between these two fields. These interactions lead to mutual influence and development of new tools on both sides. On one hand, we are witnessing the emergence of machine learning tools and algorithms that have been fully analysed from theoretical point of view, e.g., approximate nearest neighbour search.

On the other hand, ML tools enter as interior components into basic data structures or state-of-the-art approximation algorithms resulting in solutions that have better practical properties, e.g., indices. These new hybrid constructions are called learned data-structures. As the work on these ideas has just started we miss the right framework and tools for implementing state-of-art solutions and thus the research on new tools and models is hampered. This problem could be solved by creating new algorithms and data structure together with a software library that would allow to compose easily different building blocks. This could prove tools to bridge the gap between theory and practice in algorithms and show that new theoretical advances can have practical implications.

THE EDUCATION OF RESEARCHERS WHO CAN FILL THE GAP BETWEEN BASIC AND APPLIED RESEARCH WAS ONE OF THE KEY PREMISES FOR ESTABLISHING OUR RESEARCH CENTER DEDICATED TO ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY

Prof. Piotr Sankowski

CEO IDEAS NCBR

Algorithmic Tools for Data Science and ML

Data is constantly but surely becoming “the oil of a new age”. However, in order to be able to draw meaningful conclusions from this ever growing information we need to design efficient algorithms for this. This project aims at solving this fundamental problems arising in the context of processing huge data sets. The long term goal is to develop a techniques for designing very fast and possibly scalable algorithms. Graphs a the most common way to capture structures and relations in data and thus are often used in data science.

The most well-known examples include analysis of social networks, e.g., community detection or cascading processes. Similarly, graphs are often used to represent knowledge, e.g., using knowledge graphs. This approach has been visible in the shift from relational to graph perspective in databases. These and other examples show that graphs and graph algorithms are basic tools for representing and analysing the data. However, sizes of current data sets requires adoption of parallel computation, as even storing them on single machine is often impossible. For example, the World Wide Web consist of more than 5 billion pages, Google Knowledge Graph contains 500 billion facts on 5 billion entities whereas Facebook has 1.88 billion active users daily. In these scenarios, data is usually distributed across many machines, and computation needs to be performed in parallel. Real-world systems are modelled in Massively Parallel Computation (MPC) frameworks, such as MapReduce, Hadoop, Spark, or Flume.

Although, different parallel computation models have be studied for years already. MPC comes with a completely new possibilities as well as requirements. MPC computations are executed in synchronous rounds, but implementing these rounds on real-world systems takes considerable time. One round takes orders of magnitude longer than on classical Cray type system. Thus we would like to solve problems, in particular graph problems, in as few rounds as possible. With this challenge in mind, this project aims to design methods to break barriers that were impossible to overcome using classical techniques and models. More specifically, we are going to work on new algorithmic tools that would improve efficiency of both parallel and non-parallel algorithms used in data science.

Publications of team members as part of research work at IDEAS NCBR

2022

Conference name Data Links Document title Author(s)
UAI (Uncertainty in Artificial Intelligence), Eindhoven, the Netherlands
1-4 August 2022
Improved Feature Importance Computation for Tree ModelsBased on the Banzhaf Value
Tomasz Michalak
STOC (Symposium on Theory of Computing), Rome, Italy
20-24 June 2022
Subquadratic Dynamic Path Reporting in Directed Graphs Against an Adaptive Adversary
Adam Karczmarz, Michał Pawłowski, Anish Mukherjee, Mathieu Mari, Piotr Sankowski
SODA (Symposium on Discrete Algorithms)
A 3-Approximation Algorithm for Maximum Independent Set of Rectangles
Mathieu Mari
NeurIPS, New Orlean, the USA
27 November - 2 December 2022
Subgoal Search For Complex Reasoning Tasks
Tomasz Odrzygóźdź
AlgPiE 2022, Będlewo
20-26 August 2022
Subquadratic Dynamic Path Reporting against an Adaptive Adversary
Anish Mukherjee, Adam Karczmarz, Piotr Sankowski
Online matching with delays & Online busy-time scheduling
Runtian Ren
MLinPL
04.11.2022 - 06.11.2022
Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search
Tomasz Odrzygóźdź
Fine-Grained Conditional Computation in Transformers
Sebastian Jaszczur
Improved Feature Importance Computation for Tree Models Based on the Banzhaf Value
Piotr Sankowski, Adam Karczmarz, Tomasz Michalak

Piotr Sankowski

LEADER OF THE RESEARCH GROUP

Professional experience

His research interest focuses on practical application of algorithms, ranging from economic applications, through learning data structures, to parallel algorithms for data science. In 2005, he received a doctorate in computer science, and in 2009 habilitation and doctorate in physics in the field of solid state theory at the Polish Academy of Sciences. He completed post-doctoral internships at ETH Zurich Institute of Science and Sapienza University of Rome.

Awards and achievements

Piotr Sankowski co-founded the Highlights of Algorithms conference organized in 2016 in Paris and in 2017 in Berlin. He was the chairman of PC Track A 24th Annual European Symposium on Algorithms (ESA 2016), which is the largest European conference on algorithms. In 2018 he received the individual Crystal Brussels Sprout Award and the National Science Center award. He was a visiting professor at the Simons Institute for the Theory of Computing twice. In 2010-2015, he received the “IDEE for Poland” scholarship awarded by the Foundation for Polish Science (FNP), and in 2011, the Best Poster Award at the 4th International Web Search and Data Mining Conference (WSDM’11).

Grants

He is the first Pole to have received three ERC grants (ERC – European Research Council): Starting Grant (2010), Proof of Concept Grant (2015), and Consolidator Grant (2017). In 2015-2018, as the leader of the research group, he conducted research under the National Science Center grant “Efficient Planar Graph Algorithms” and was the leader of the research project under the grant from the National Center for Research and Development. He is also a co-founder of the spin-off company MIM Solutions, in which he has been the scientific director since 2021. From March 2021, he is also the president of the management board at IDEAS NCBR.

OUR RESEARCH

Algorithms, especially the ones used in machine learning, promise to aid people in making decisions.

Blockchain technology was introduced in 2008. Several machine learning applications involve issues where privacy plays a special role.

Today, both science and industry rely heavily on machine learning models, predominantly artificial neural networks.

Skip to content