When solving hard problems, people often try to decompose them into smaller parts that are typically easier to complete. Humans do particularly well in solving complex reasoning tasks when we apply a mental process based on moving from one idea to a related one. The progress is made through milestones, not through long leaps or, on the contrary, random attempts to test all possible intellectual options. As humans can switch from one to the next mode of decision-making almost instantly, these complicated neurological processes are felt as spontaneous.

Can this be replicated and taught to the AI? Many complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. In the paper “Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search” authors present a new algorithm that significantly surpasses traditional hierarchical planning algorithms on three complex reasoning tasks: Sokoban, the Rubik’s Cube, and inequality proving benchmark INT, setting new state-of-the-art. It achieves that by applying the mechanism of adaptive selection of the subgoal distance, thanks to which it can tackle complex reasoning tasks and outperforms existing comparable methods.

We set down with Dr. Tomasz Odrzygóźdź, a post-doc researcher at IDEAS NCBR who has recently co-authored his second paper on the subject, and discussed the new method as well as its potential future applications.

**Let’s start with the basics. What is planning – and planning horizon- in the mathematical sense? Is it in any way comparable to the human methods of planning our actions? **

TO: In the formal sense, a planner is every algorithm that, before taking an action, analyses its consequences, both in the near and far future. However, not every algorithm works this way. For example, there are neural networks trained to mimic human behavior (e.g. in car racing) without predicting the consequences. For instance, such a network guesses what would a human do and do not asses his or hers next actions. Planning on the other hand, is focused on searching through the vast space of possible future states and looking for the best one to achieve.

Planning algorithms are inspired by a human’s approach to complex problems. We do a lot of planning every day, so we want the computer to be also good at it. When we face a question like “what is the best chess move in a given position?” it is natural to choose several candidate moves, analyse the corresponding outcomes then maybe speculate on the opponent’s next move, etc. In real life, we are using similar mechanisms all the time, usually quite effortlessly. Humans have a natural tendency to adjust the planning horizon, meaning we can quickly decide how far into the future we must plan. Imagine driving a car. When traversing a crowded, narrow, winding street, it is crucial to focus on the closest events: the next turn to make, the next car to keep distance from or the next pedestrian to avoid. Alternatively, after entering a straight, empty street, it is enough to think about reaching its far end. As humans can switch from one (long-term) to the next (short-term) type of decision-making almost instantly, thus for most of us driving a car or a bike is intuitive and does not require constant hard mental labor. These complicated neurological processes are felt as spontaneous.

This is planning, which we use in real life when, for instance looking for the most optimal path through a city that includes visits to several places. We act similarly when deciding on the best code structure, and solving complex mathematical problems.

**How do subgoal search models differ from the more traditional, older generation algorithms? Why could it be useful? **

TO: More traditional planning algorithms build their plans using single elementary actions. This is the case in single moves in chess or in single control signal in robot steering.. For example, a traditional planning algorithm would construct a plan for a robot in terms of consequent movements of its parts: the first move left feet by 3 cm, then move the right arm by 1 cm, etc. We, as humans rarely think in such a way. Instead, at first we typically mentally construct a more distant goal which we subsequently try to achieve step-by-step, with simpler actions. If the problem is hard, we also create subgoals: we find some intermediary states (or stages) called subgoals that are on the way to the ultimate solution that is easier to achieve from the current situation. When the problem is even harder, it may be necessary to consider several subgoals at each step we take.

Our subgoal search method is based on assumptions about human way of thinking. The main idea behind our algorithm is to allow it to construct plans in terms of subgoals, not elementary actions. This approach requires us to have an efficient mechanism for proposing subgoals. Luckily, it turned out that neural networks are very good in this task which quickly surpassed our initial expectations. A properly trained network can predict a variety of subgoals. It turned out that it is not even necessary for all of them to be correct: we only need some of them to be useful and the algorithm manages to sort out the best subgoals. Again, it actually works in a quite similar manner for us, humans. When solving a problem, we don’t need every single idea that comes to our mind to be true. Frequently, we need just one really good concept at each step we take to drive us forward.

**What types of problems did you choose to test your method? How are Rubik’s cube and Sokoban similar? **

TO: In our study, we focused on three problems: Sokoban, Rubik’s cube, and proving mathematical inequalities. All of them have proper characteristics: relatively simple rules, short representation of a current state, easy solution verification, and most importantly, very complex logical foundations behind. For example, Sokoban has many potential irreversible states (states with no comeback option) that must be avoided while playing. Rubik’s cube has a huge space of possible states: 4.3 x 10^19 of different cubes. Similarly, there is unlimited space of possible mathematical inequalities, so there’s a lot for the algorithm to compute and learn from.

**And did the results surprise you? **

TO: Yes, they did! The most surprising result of our work was the fact that a neural network can be trained to generate a diverse set of subgoals. Even more startlingly, training such a network is quite easy and does not require well-structured information. The data we used to train our network was intentionally of low quality. By choosing imperfect data we wanted to check whether this method could be used for solving a complex problem in realistic scenarios in which high quality data isn’t typically available. As it turned out, it wasn’t a problem for our algorithm. Another surprising observation was that it is sufficient if only half of the subgoals are good (in terms of efficiency) to bring huge improvements as compared to traditional planning algorithms. We can be wrong half of the time and still have 7 times faster performance with a much higher success rate than classical planners. In the example of Rubik’s cube, we obtain a 100% success rate, and in the other two cases, we achieved only slightly lower, 91-99 % success rates. You can also watch a visualization of our research in a video based upon it.

**What possible practical applications could the new approach to planning algorithms offer? **

TO: Our research was experimental and it’s hard to tell at this point how our method could be used for solving real-life problems. What I can do however, is to share my thoughts on hypothetical areas of future use. At the current stage of technology, there are plenty of examples of pressing problems that are too hard to be solved exactly and quickly. Even superfast computers struggle with, for example, supply or delivery chain optimization. The same holds true for optimal design problems (e.g., constructing optimal electrical circuits, optimal allocation of jobs to people looking for them, etc.) or automatic code-generation. For me, as a scientist, it’s clear that in each of these domains it may be useful to introduce subgoals and observe how algorithm uses them. There is a chance that subgoal search, due to its high efficiency, may be able to find better solutions.

TO: I am strongly convinced that the subgoal search method is innovative, efficient, and very natural in the sense that it mimics the way we as humans normally think. What would I really like to do is to examine this general method on a bigger set of problems. Particularly, on the ones with some randomness or hidden information within them. This also has some potential real-life applications. Many real-life problems, seem to have inaccuracies in the observation and they also possess some stochasticity. I have three major ideas:

Firstly, apply subgoal search to two-players games, like chess. Here the main thing is that the opponent may potentially disallow to reach the subgoal (i.e. my plan being completely different from than opponent’s plan), thus the algorithm must be able to account for that.

Secondly, use the subgoal search for robotics, for example, drones. I would like to construct an algorithm that when steering a drone first designs a plan with high-level subgoals and then translates it to a basic control signal.

Finally, I’d like to see an expert iteration for subgoal searches. In our work we observed that an algorithm that learned from poor quality data performs excellent, eventually producing high-quality data. The idea is to use this high-quality data to train the whole algorithm again and repeat the process and see the outcomes.

Further readings:

- “Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search”, 2022.
- “Subgoal Search For Complex Reasoning Tasks“, 2022.

**Bio**: An AI postdoc researcher at IDEAS-NCBR. He holds Ph.D. in mathematics (Polish Academy of Sciences, 2019) and MA in the Inter-faculty Individual Studies in Mathematics and Natural Sciences (University of Warsaw, 2013).

His main research interests focus on planning algorithms, solving complex logic problems with AI, reinforcement learning, and natural language processing. Previously, he worked in in the area of geometric group theory (random groups theory) and was interested in topology and differential geometry as well as relativity theory.