HRM explained

Getting your Trinity Audio player ready…

The Hierarchical Reasoning Model (HRM) described in the paper does not explicitly state that it avoids tokens entirely; rather, it operates without relying on Chain-of-Thought (CoT) prompting or token-level language externalization for reasoning, which is common in traditional large language models (LLMs). Instead, HRM employs a brain-inspired, hierarchical, and multi-timescale recurrent architecture to perform complex reasoning tasks in a single forward pass. Below, I’ll explain how HRM works, focusing on its architecture and processing mechanism, as described in the document.

Key Features of HRM’s Approach

HRM is designed to overcome the limitations of standard Transformer-based LLMs, which are computationally shallow and rely on CoT for reasoning. It draws inspiration from the human brain’s hierarchical processing and temporal separation, using two interdependent recurrent modules operating at different timescales:

High-Level Module (H-module):

Operates on a slower timescale.
Responsible for abstract, high-level planning and maintaining a broader context.
Updates its state after the low-level module completes a cycle of computations, providing stable guidance for the overall reasoning process.

Low-Level Module (L-module):

Operates on a faster timescale.
Handles rapid, detailed computations, such as processing specific elements of a task (e.g., individual cells in a Sudoku puzzle or segments of a maze).
Converges to a local equilibrium within each cycle before being reset by the H-module for the next cycle.

How HRM Works Without CoT or Token-Level Reasoning

Unlike LLMs that use CoT, which break down reasoning into explicit, token-based steps (e.g., generating intermediate thoughts in natural language), HRM processes reasoning tasks end-to-end in a single forward pass without explicit supervision of intermediate steps. Here’s how it achieves this:

Input Processing:

The model takes input data, such as a flattened 2D grid for tasks like Sudoku, mazes, or the Abstraction and Reasoning Corpus (ARC). For example, a 9×9 Sudoku grid or a 30×30 maze is represented as a sequence of values.
An input network ( f_i ) processes the raw input into an initial representation, which is fed into the recurrent modules.

Hierarchical Convergence:

The L-module performs rapid computations over ( T ) timesteps within a cycle, converging to a local equilibrium based on the current high-level state provided by the H-module.
After ( T ) steps, the H-module incorporates the L-module’s final state and updates its own state, providing a new context for the L-module to begin a new cycle.
This process repeats for ( N ) high-level cycles, resulting in an effective computational depth of ( N \times T ) steps, allowing the model to handle complex tasks requiring extensive search or backtracking (e.g., Sudoku or maze navigation).

Output Generation:

An output network ( f_o ) maps the final states of the H-module and L-module to the task’s solution (e.g., a completed Sudoku grid or an optimal maze path).
The model produces solutions directly from the input without generating intermediate token-based reasoning steps, unlike CoT methods.

One-Step Gradient Approximation:

HRM uses a novel training approach based on Deep Equilibrium Models (DEQ), employing a one-step gradient approximation to avoid backpropagation through time (BPTT). This method computes gradients using the final states of the H-module and L-module, requiring only ( O(1) ) memory compared to BPTT’s ( O(T) ).
This makes training efficient and scalable, aligning with biologically plausible learning mechanisms.

Adaptive Computation Time (ACT):

HRM incorporates an ACT mechanism, allowing it to dynamically adjust the number of computational steps based on task complexity. A Q-head predicts Q-values from the H-module’s final state to determine when to stop computing, optimizing efficiency while maintaining high accuracy (see Figure 5 in the document).

Representation and Processing

While the document mentions “tokens” in the context of input grid sizes (e.g., a 30×30 maze grid as 900 tokens), this refers to the flattened input representation rather than tokenized language as in LLMs. HRM’s inputs are numerical or symbolic representations of tasks (e.g., grid cells for Sudoku or maze coordinates), not natural language tokens. The model processes these inputs through its recurrent modules, which maintain hidden states (( z_L ) for the L-module and ( z_H ) for the H-module) that evolve over time. These hidden states capture the computational progress and context, enabling the model to reason hierarchically without explicit token-based reasoning steps.

Comparison to Token-Based LLMs

LLMs with CoT: Traditional LLMs generate reasoning steps as sequences of tokens, requiring large context windows and extensive training data to learn reasoning patterns. This approach is computationally expensive and brittle for tasks requiring deep reasoning or backtracking.
HRM: By using recurrent modules with hierarchical convergence, HRM performs reasoning implicitly within its architecture. It avoids tokenizing intermediate reasoning steps, instead relying on the interplay of high-level planning and low-level computation to solve tasks directly. This allows HRM to achieve near-perfect performance on complex tasks like Sudoku-Extreme and Maze-Hard with only 27 million parameters and 1000 training examples, compared to larger LLMs with longer context windows.

Example Workflow for a Task

For a Sudoku-Extreme puzzle:

The 9×9 grid is flattened into a sequence and fed into the input network ( f_i ).
The L-module processes individual cells or constraints rapidly, converging to a local equilibrium (e.g., candidate values for a cell).
The H-module updates its state based on the L-module’s output, guiding the next cycle toward a globally consistent solution.
After multiple cycles, the output network ( f_o ) produces the completed grid.
The ACT mechanism determines when the solution is sufficiently accurate, minimizing unnecessary computations.

For ARC-AGI, HRM incrementally adjusts the output grid, resembling hill-climbing optimization, rather than backtracking as in Sudoku (see Figure 7).

Why This Matters

HRM’s architecture enables it to handle tasks that require polynomial-time computation, which standard Transformers (limited to ( AC^0 ) or ( TC^0 ) complexity classes) cannot solve end-to-end. By mimicking the brain’s hierarchical and multi-timescale processing, HRM achieves Turing-complete computation, making it a step toward universal reasoning systems. Its efficiency (low parameter count, minimal training data, and ( O(1) ) memory for training) and performance (e.g., 40.3% on ARC-AGI vs. 34.5% for o3-mini-high and 21.2% for Claude 3) highlight its potential as a transformative approach.

Conclusion

HRM does not eliminate tokens in the sense of input representation but avoids token-based, language-like reasoning steps used in CoT. It processes inputs through a hierarchical recurrent architecture, using high-level and low-level modules to perform abstract planning and detailed computations, respectively. This allows HRM to solve complex reasoning tasks efficiently in a single forward pass, offering a compelling alternative to traditional LLMs. For further details, the code is available at github.com/saptientinc/HRRM.