Long-horizon planning with Hiverge's platform

How the Hive cracked Airbus' planning challenge

Author: Hiverge

25 November 2025

Airbus Beluga aircraft

Imagine you are planning a road trip: you have to choose the best route to pick up all of your friends, reach every destination you want to visit, decide where to stop for fuel, and adapt if there's traffic or bad weather. Now imagine doing that not just for one car – but for thousands of delivery trucks moving across a continent. That's where automatic planning comes in: it lets machines figure out what actions to take to reach a goal efficiently, even when the possibilities are far too complex for humans to work out by hand. It's about giving computers the ability to reason about actions and goals, to understand what must happen first, what can go wrong, and what steps will lead from the present to a desired outcome.

Automated planning is key in many application domains, ranging between robotics, logistics and supply‑chain optimization, vehicle routing, smart grid management and energy distribution optimization, among others. Even scientific research problems in areas such as quantum computing and organic chemistry, can be framed and solved as planning problems.

In academia, researchers are often interested in developing generalized planning algorithms, i.e., planners which can efficiently solve any well-structured planning problem, regardless of the domain. Unfortunately, their generalized nature prevents these algorithms from exploiting the peculiarities of the problem. In contrast, tailored planning algorithms can often find solutions orders of magnitude faster, and solve problems which are orders of magnitude larger. The problem is that tailoring planning algorithms to a specific problem domain can be extremely time-consuming and requires deep domain expertise!

This makes writing tailored planning algorithms an excellent testing ground for our system, the Hive. Using the Hive, we can automatically discover high-performing planning algorithms specifically tailored towards any given planning domain, significantly reducing development time and effort.

The Airbus Beluga™ AI Challenge

At the beginning of the year, we decided to put this to the test by participating in the Airbus Beluga™ AI Challenge held by TUPLES. This competition required participants to find solutions to a logistic planning problem proposed by Airbus, which involved the transportation, storage, and management of aircraft parts to successfully complete a preset production schedule for constructing new aircrafts.

There are two key difficulties to this problem domain:

  1. Limited number of resources: There are only a limited number of racks and trailers available to transport and store aircraft parts. Poor management of these resources can lead to situations where progress is blocked unless previous actions are reversed.
  2. Problem scale: This competition explores the scale of real-world logistic planning problems, and the difficulties arising from the combinatorial explosion of possible actions one can take. Some of the problems explored in this competition involve over 800 aircraft parts!

This is a particularly challenging planning domain — even state-of-the-art planners can only solve the smallest and simplest problem instances. What makes it especially interesting as a testing ground is that it's completely new. At the time of our experiments, LLMs had never been trained on this type of problem. That means any solution the Hive produced isn't drawn from memorized data, but from genuine extrapolative reasoning — reasoning its way through a problem domain they've never seen before.

Despite these challenges, the Hive was able to discover a planning algorithm which solves all of the example problems supplied by the competition. This demonstrates orders-of-magnitude improvements over current state-of-the-art methods, ultimately winning us first place in both the Scalability Deterministic and the Scalability Probabilistic challenges.

Methodology

Using the Hive, we evolve a planning strategy for the Beluga planning domain, i.e., a function which encodes a decision tree for what action to take next given its current state.

Notably, we initialized the Hive from scratch with nothing more than a function header, and asked it to find a planning algorithm which solves as many problems as possible (out of a training set of 270 problems of varying difficulties), and in as few actions as possible.

policy_planner_initial.py
import problem_logic as pl

def policy(problem: pl.Instance) -> pl.State:
    return problem.get_initial_state()

Apart from the Python API encoding the problem logic and a short description of the problem, we provide no other guidance or hints to the Hive. After training, we evaluate the generated planning algorithm against a different set of 269 problems.

Results

Remarkably, starting from the naïve template function shown in Figure 1, that solves no problems, the Hive was able to – completely independently – find a planning algorithm which solves all of the 269 evaluation problems.

Figure 1: Comparison of planners on Airbus Beluga planning domain

We compare the Hive’s solution against two baseline LLM-based-methods, both relying on Gemini 2.5 Pro:

  1. Direct plan: For each problem instance, we ask Gemini 2.5 Pro to directly produce a plan to solve the problem using a few-shot prompting technique.
  2. Planning algorithm: Similar to the Hive, we ask Gemini 2.5 Pro to produce a planning strategy to solve problems in the Beluga planning domain.

In Figure 2, we see that both of these LLM-baseline methods perform poorly, which aligns with the evidence in academic literature that even the most powerful modern LLMs are unable to perform complex, multi-stage planning alone1,2,3,4. Therefore, the Hive, which builds on top of LLMs, is key to obtaining novel, robust, and breakthrough algorithmic discoveries.

We also compare against a state-of-the-art generalised planning algorithm, best-first width search (BFWS). Again, we see that the planning algorithm discovered by the Hive solves significantly more problems, and problems of much higher complexity. Additionally, the planning algorithm of the Hive finds solutions much faster than BFWS – for problems which both planners were able to solve, the Hive found solutions over 1000 times faster on average!

The Hive's Planning Algorithm

The planning algorithm discovered by the Hive spanned almost 500 lines of code, which we make available in this repository. In Figure 2, we show the trajectory of solutions found by the Hive, and highlight some of its notable algorithmic discoveries.

Figure 2: Evolution trajectory of the Hive's final planning algorithm

  1. Implement iterative, priority-driven Beluga solver policy: The initial implementation of the planning policy uses an iterative, greedy strategy to solve the Beluga problem by applying a sequence of prioritized actions. These actions are grouped into high-priority tasks like production line fulfillment and incoming jig management, and medium-priority tasks for empty jig handling and outgoing flights.
policy_planner_1.patch
  1. Add rack unblocking logic for Beluga and production: The logic for picking up empty jigs needed for outgoing flights was refined by systematically unblocking them from racks. Additionally, new strategies are implemented to unblock required jigs for production lines, including moving blocking jigs to other racks or freeing up trailer space.
policy_planner_6.patch
  1. Implement systematic unblocking for production jigs: The logic handling picking up required jigs for production lines from racks was rewritten. It now identifies all blocked required jigs, prioritizes them, and iteratively moves blocking jigs until a target jig can be directly picked up.
policy_planner_7.patch
  1. Add cycle detection and backtracking to policy: This change introduces state-tracking to the policy planner workflow, enabling explicit cycle detection and emergency backtracking. The planner now tracks visited states and reverts to a previous state if a cycle is detected or if a predefined number of steps are taken without making progress.
policy_planner_11.patch
  1. Prioritize production lines by overall schedule urgency: The production line prioritization heuristic was updated by introducing a new metric that estimates the total processing steps before a jig is required. This metric is now the primary factor for sorting production line priorities in ascending order, effectively changing the preference from higher-value components to lower-value components.
policy_planner_24.patch

Conclusion

LLMs alone lack the systematic reasoning and long-horizon planning capabilities needed for complex problems. However, when embedded within an evolutionary framework like the Hive, where generated algorithms are continuously evaluated, refined, and improved based on concrete performance metrics, they can discover solutions that outperform both general-purpose classical planners and rival expert-designed domain-specific algorithms. By providing nothing more than the problem domain and evaluation metrics, the Hive discovered sophisticated techniques that may have otherwise required months of expert development.

The Hive is particularly powerful for novel domains where classical planners struggle due to combinatorial explosion, and where domain expertise for manual algorithm design is limited or expensive. As planning problems continue to grow in scale and complexity, whether it’s a robot assembling a satellite in orbit, or an AI system designing a new drug, tools like the Hive agent which can automatically discover tailored, high-performing solutions will become increasingly valuable.

Hiverge