When solving hard problems, people often try to decompose them into smaller parts that are typically easier to complete. Humans do particularly well in solving complex reasoning tasks when we apply a mental process based on moving from one idea to a related one. The progress is made through milestones, not through long leaps or, on the contrary, random attempts to test all possible intellectual options. As humans can switch from one to the next mode of decision-making almost instantly, these complicated neurological processes are felt as spontaneous.
Can this be replicated and taught to the AI? Many complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. In the paper „Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search” authors present a new algorithm that significantly surpasses traditional hierarchical planning algorithms on three complex reasoning tasks: Sokoban, the Rubik’s Cube, and inequality proving benchmark INT, setting new state-of-the-art. It achieves that by applying the mechanism of adaptive selection of the subgoal distance, thanks to which it can tackle complex reasoning tasks and outperforms existing comparable methods.
We set down with Dr. Tomasz Odrzygóźdź, a post-doc researcher at IDEAS NCBR who has recently co-authored his second paper on the subject, and discussed the new method as well as its potential future applications.
Let’s start with the basics. What is planning – and planning horizon- in the mathematical sense? Is it in any way comparable to the human methods of planning our actions?
TO: In the formal sense, a planner is every algorithm that, before taking an action, analyses its consequences, both in the near and far future. However, not every algorithm works this way. For example, there are neural networks trained to mimic human behavior (e.g. in car racing) without predicting the consequences. For instance, such a network guesses what would a human do and do not asses his or hers next actions. Planning on the other hand, is focused on searching through the vast space of possible future states and looking for the best one to achieve.
Planning algorithms are inspired by a human’s approach to complex problems. We do a lot of planning every day, so we want the computer to be also good at it. When we face a question like „what is the best chess move in a given position?” it is natural to choose several candidate moves, analyse the corresponding outcomes then maybe speculate on the opponent’s next move, etc. In real life, we are using similar mechanisms all the time, usually quite effortlessly. Humans have a natural tendency to adjust the planning horizon, meaning we can quickly decide how far into the future we must plan. Imagine driving a car. When traversing a crowded, narrow, winding street, it is crucial to focus on the closest events: the next turn to make, the next car to keep distance from or the next pedestrian to avoid. Alternatively, after entering a straight, empty street, it is enough to think about reaching its far end. As humans can switch from one (long-term) to the next (short-term) type of decision-making almost instantly, thus for most of us driving a car or a bike is intuitive and does not require constant hard mental labor. These complicated neurological processes are felt as spontaneous.
This is planning, which we use in real life when, for instance looking for the most optimal path through a city that includes visits to several places. We act similarly when deciding on the best code structure, and solving complex mathematical problems.
How do subgoal search models differ from the more traditional, older generation algorithms? Why could it be useful?
TO: More traditional planning algorithms build their plans using single elementary actions. This is the case in single moves in chess or in single control signal in robot steering.. For example, a traditional planning algorithm would construct a plan for a robot in terms of consequent movements of its parts: the first move left feet by 3 cm, then move the right arm by 1 cm, etc. We, as humans rarely think in such a way. Instead, at first we typically mentally construct a more distant goal which we subsequently try to achieve step-by-step, with simpler actions. If the problem is hard, we also create subgoals: we find some intermediary states (or stages) called subgoals that are on the way to the ultimate solution that is easier to achieve from the current situation. When the problem is even harder, it may be necessary to consider several subgoals at each step we take.
Our subgoal search method is based on assumptions about human way of thinking. The main idea behind our algorithm is to allow it to construct plans in terms of subgoals, not elementary actions. This approach requires us to have an efficient mechanism for proposing subgoals. Luckily, it turned out that neural networks are very good in this task which quickly surpassed our initial expectations. A properly trained network can predict a variety of subgoals. It turned out that it is not even necessary for all of them to be correct: we only need some of them to be useful and the algorithm manages to sort out the best subgoals. Again, it actually works in a quite similar manner for us, humans. When solving a problem, we don’t need every single idea that comes to our mind to be true. Frequently, we need just one really good concept at each step we take to drive us forward.