Papers
Hierarchical Reasoning Model
Wang et al., 2025 — A recurrent architecture with only 27 million parameters that solves complex planning and reasoning tasks without Chain-of-Thought.
The Hierarchical Reasoning Model (HRM) demonstrates that strong reasoning does not require enormous models. With only 27 million parameters and just 1000 training examples, it masters Sudoku, maze navigation, and ARC puzzles — without the usual Chain-of-Thought approach. Instead, it uses two recurrent modules: one for abstract planning and one for concrete computation.
The Problem: Reasoning Is Expensive
Modern large language models (LLMs) typically solve complex reasoning tasks through Chain-of-Thought (CoT): the model writes out its thinking process explicitly as text, step by step. This works — but it is expensive. More tokens, more compute, more cost.
On top of that, more complex tasks typically require more parameters. GPT-4 is estimated at roughly 1.8 trillion parameters. For many businesses, that is not a realistic option — neither in terms of cost nor data privacy, when data must not leave the premises.
The HRM paper asks a fundamentally different question: does reasoning really need to be that large and that explicit?
The Approach: Hierarchical Recurrent Processing
The core idea of HRM is simple but powerful: reasoning happens on two levels simultaneously — similar to how humans plan strategically and execute concrete actions without narrating every thought out loud.
The model is recurrent: it processes the same input multiple times across several "thinking rounds" (recurrent steps) before producing an answer. This internal processing space replaces the explicit reasoning text used in CoT.
- No Chain-of-Thought required: Thinking happens internally in the network's activations, not as visible text.
- Only 1000 training examples: Instead of millions of samples, a tiny dataset suffices — a remarkable result.
- 27 million parameters: Small enough to run locally on consumer hardware.
The Architecture: Two Modules, Two Levels
HRM consists of two recurrent modules that work together hierarchically:
- High-Level Module: Responsible for abstract, strategic planning. It processes the task at a high level and provides direction and strategy. This module runs slowly — few iterations per task.
- Low-Level Module: Handles detailed, concrete computation. It carries out the strategy set by the high-level module in small steps. This module runs faster and iterates more frequently.
Both modules exchange state information — the high-level module can adjust its state when the low-level module encounters obstacles. This creates a dynamic loop between planning and execution.
Results: Small Beats Large
The authors test HRM on three different task types that require different kinds of reasoning:
- Sudoku: The model reliably solves Sudoku puzzles. Sudoku requires systematic elimination and backtracking — classic strengths of hierarchical planning.
- Maze Navigation: HRM finds paths through complex mazes. The high-level module plans the overall route, the low-level module navigates the individual steps.
- ARC (Abstraction and Reasoning Corpus): ARC is considered a particularly hard benchmark for abstract pattern recognition. HRM achieves notable results — without any Chain-of-Thought.
The remarkable part: these results are achieved with only 27 million parameters and 1000 training examples — a fraction of what comparable approaches require.
What Does This Mean for My Business?
The HRM paper is especially relevant for SMEs that want or need to run AI locally — for example due to data privacy requirements (GDPR) or limited budgets. Three concrete implications:
- Local deployment becomes more realistic: A 27M parameter model runs on any modern laptop or small server. No cloud dependency, no data sharing with external providers.
- Specialized small models: Research shows that for clearly defined tasks (planning, routing, puzzles, structured decisions), small specialized models can perform surprisingly well — at a fraction of the cost.
- Fine-tuning with few data points: 1000 training examples is realistic for many businesses. This means your own data could be enough to train a specialized model without needing to collect massive datasets.
Important caveat: HRM is not a universal model. It is optimized for structured reasoning tasks, not free text generation or conversation. The use case makes sense where clear rules and structured problems dominate — for example in planning systems, process optimization, or decision support.
Context: Why Does This Paper Matter?
The HRM paper appeared in June 2025 and sits within a growing research direction questioning whether large models are really necessary for everything. While industry pushes toward ever-larger models, this research shows that architecture can replace size.
The idea of splitting reasoning into two levels — strategic and tactical — is not new. It appears in classical AI planning (STRIPS, HTN), cognitive science (System 1 and System 2 per Kahneman), and robotics. HRM successfully transfers this principle into a neural network.
For practitioners, this means: the search for the right model should not start only with size and benchmark scores, but with architecture. A well-designed small model can outperform a generalist large model on specific tasks — while being faster and cheaper.
Sources
- Wang, G., Li, J., Sun, Y., Chen, X., Liu, C., Wu, Y., Lu, M., Song, S., Abbasi Yadkori, Y. (2025). "Hierarchical Reasoning Model." arXiv:2506.21734 (submitted 2025-06-26, revised 2025-08-04)
Next step: move from knowledge to implementation
If you want more than theory: setups, workflows and templates from real operations for teams that want local, documented AI systems.
- Local and self-hosted by default
- Documented and auditable
- Built from our own runtime
- Made in Austria