LeRobot documentation
WALL-OSS
WALL-OSS
WALL-OSS is an open-source foundation model for embodied intelligence, proposed by the XSquare Robot team in 2025. The LeRobot implementation is adapted from their open-source WallX repository.
X Square Robot’s WALL-OSS is now integrated into Hugging Face’s LeRobot ecosystem. This is an exciting collaborative project between the LeRobot and X Square Robot teams. You can now post-train, evaluate, and deploy WALL-OSS directly through LeRobot. With this, we’re aiming to make it easier for the open-source robotics community to customize and deploy WALL-OSS foundation models. Read and explore WALL-OSS paper and code.
Model Overview
The WALL-OSS team is building the embodied foundation model to capture and compress the world’s most valuable data: the continuous, high-fidelity stream of physical interaction. By creating a direct feedback loop between the model’s decisions and the body’s lived experience, the emergence of a truly generalizable intelligence is enabled—one that understands not just how the world works, but how to act effectively within it.
Technically, WALL-OSS introduces a tightly coupled multimodal architecture (tightly-coupled MoE structure) that integrates both discrete and continuous action modeling strategies. Through a two-stage training pipeline (Inspiration → Integration), the model gradually unifies semantic reasoning and high-frequency action generation. Its core innovations include:
- Embodied perception–enhanced multimodal pretraining: Large-scale training on unified vision–language–action data to strengthen spatial, causal, and manipulation understanding.
- Unified Cross-Level Chain-of-Thought (Uni-CoT): A single differentiable framework that unifies high-level instruction reasoning, sub-task decomposition, and fine-grained action synthesis, forming a continuous chain from “understanding” to “execution.”
- Mixture-of-Experts (MoE) action heads: Dynamically activating experts depending on the task phase and modeling actions in discrete or continuous space to maintain stable VLM priors.
- Two-stage training paradigm:
- Inspiration stage: Injecting discrete action priors to strengthen spatial understanding and semantic-action alignment.
- Integration stage: Using flow matching to achieve high-frequency continuous control.
Installation Requirements
Install LeRobot by following our Installation Guide.
Install WallX dependencies by running:
pip install -e ".[wallx]"
Usage
To use WallX in LeRobot, specify the policy type as:
policy.type=wall_xTraining
For training WallX, you can use the standard LeRobot training script with the appropriate configuration:
python src/lerobot/scripts/lerobot_train.py \
--dataset.repo_id=your_dataset \
--policy.type=wall_x \
--output_dir=./outputs/wallx_training \
--job_name=wallx_training \
--policy.repo_id=your_repo_id \
--policy.pretrained_name_or_path=x-square-robot/wall-oss-flow \
--policy.prediction_mode=diffusion \
--policy.attn_implementation=eager \
--steps=3000 \
--policy.device=cuda \
--batch_size=32Training Arguments
| Argument | Description |
|---|---|
--dataset.repo_id | The Hugging Face Hub repository ID for your training dataset (e.g., lerobot/aloha_sim_insertion_human) |
--policy.type | Specifies using the WallX policy architecture |
--output_dir | Local directory where training checkpoints and logs will be saved |
--job_name | A name identifier for this training run (used in logging/tracking) |
--policy.repo_id | Your Hugging Face Hub repo ID where the trained model will be pushed |
--policy.pretrained_path | Path to pretrained WallX weights to initialize from (the official WALL-OSS checkpoint) |
--policy.prediction_mode | The action prediction strategy: diffusion or fast - diffusion uses iterative denoising for action generation, fast uses next token prediction instead |
--policy.attn_implementation | Attention implementation backend - eager uses standard PyTorch attention (alternatives include flash_attention_2 or sdpa) |
--steps | Total number of training steps to run |
--policy.device | Device to train on (cuda for GPU, cpu for CPU) |
--batch_size | Number of samples per training batch |
License
This model follows the Apache 2.0 License, consistent with the original WallX repository.
Update on GitHub