Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
141 changes: 141 additions & 0 deletions docs/source/experimental-features/bleeding-edge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,144 @@ Directly integrating such features before they are complete and without feedback

To address this, some major features will be released as Experimental Feature Branches.
This way, the community can experiment with and contribute to the feature before it's fully integrated, reducing the likelihood of being derailed by unexpected and new errors.

RL Post-Training for VLA Models
---------------------------------

`RLinf <https://github.com/RLinf/RLinf.git>`_ is a flexible and scalable open-source RL infrastructure designed for
Embodied and Agentic AI. This integration enables **reinforcement learning fine-tuning of Vision-Language-Action
(VLA) models** (e.g., GR00T, OpenVLA) on Isaac Lab simulation tasks.

The typical workflow follows three stages:

1. **Data collection** — Collect demonstration data from the Isaac Lab environment (e.g., via teleoperation or scripted policy).
2. **Base model training** — Train a VLA base model (e.g., GR00T) on the collected demonstrations using supervised learning.
3. **RL fine-tuning** — Fine-tune the pretrained VLA model on the Isaac Lab task using RLinf with PPO / Actor-Critic / SAC.

Overview
~~~~~~~~

The RLinf integration allows Isaac Lab users to:

- Fine-tune pretrained VLA models on Isaac Lab tasks using PPO / Actor-Critic / SAC
- Leverage RLinf's FSDP-based distributed training across multiple GPUs/nodes
- Define observation/action mappings from Isaac Lab to GR00T format via a single YAML config
- Register Isaac Lab tasks into RLinf without modifying RLinf source code

Architecture
~~~~~~~~~~~~

.. code-block:: text

┌────────────────────────────────────────────────────────────────┐
│ RLinf Runner │
│ (EmbodiedRunner / EvalRunner) │
├────────────────┬──────────────────────┬────────────────────────┤
│ Actor Worker │ Rollout Worker │ Env Worker │
│ (FSDP) │ (HF Inference) │ (IsaacLab Sim) │
│ │ │ │
│ Policy │ Multi-step rollout │ IsaacLabGenericEnv │
│ Update │ with VLA model │ ├─ _make_env_function │
│ │ │ ├─ _wrap_obs │
│ │ │ └─ _wrap_action │
└────────────────┴──────────────────────┴────────────────────────┘

**Data flow:**

1. ``EnvWorker`` runs Isaac Lab simulation and converts observations to RLinf format
2. ``RolloutWorker`` runs VLA model inference (e.g., GR00T) to produce actions
3. Actions are converted back to Isaac Lab format and stepped in the environment
4. ``ActorWorker`` updates the VLA model with PPO/actor-critic loss via FSDP

Prerequisites
~~~~~~~~~~~~~

- **Isaac Lab** installed and configured
- **Isaac-GR00T** repo (for VLA inference and data transforms)
- A **pretrained VLA checkpoint** in HuggingFace format
- Multi-GPU setup recommended (FSDP requires at least 1 GPU)

Installation
~~~~~~~~~~~~

From the Isaac Lab root directory:

.. code-block:: bash

# Install isaaclab_contrib with the RLinf extra
pip install -e "source/isaaclab_contrib[rlinf]" --ignore-requires-python

# Install Isaac-GR00T (pinned version)
git clone https://github.com/NVIDIA/Isaac-GR00T.git
cd Isaac-GR00T
git checkout 4af2b622892f7dcb5aae5a3fb70bcb02dc217b96
pip install -e .[base] --no-deps
cd ../

Quick Start
~~~~~~~~~~~

**Training** — RL fine-tuning of a pretrained VLA model:

.. code-block:: bash

python scripts/reinforcement_learning/rlinf/train.py \
--task Isaac-Assemble-Trocar-G129-Dex3-v0 \
--config_path source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/config \
--config_name isaaclab_ppo_gr00t_assemble_trocar

**Evaluation** — Evaluate a trained checkpoint with video recording:

.. code-block:: bash

python scripts/reinforcement_learning/rlinf/play.py \
--task Isaac-Assemble-Trocar-G129-Dex3-Eval-v0 \
--model_path /path/to/checkpoint \
--config_path source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/config \
--config_name isaaclab_ppo_gr00t_assemble_trocar \
--video

Configuration
~~~~~~~~~~~~~

All configuration lives in a **single YAML file** loaded by `Hydra <https://hydra.cc/>`_.
The key configuration block is the ``env.train.isaaclab`` section, which defines how Isaac Lab observations
are converted to GR00T format:

.. code-block:: yaml

isaaclab: &isaaclab_config
task_description: "assemble trocar from tray"

# IsaacLab → RLinf observation mapping
main_images: "front_camera"
extra_view_images:
- "left_wrist_camera"
- "right_wrist_camera"
states:
- key: "robot_joint_state"
slice: [15, 29]
- key: "robot_dex3_joint_state"

# GR00T → IsaacLab action conversion
action_mapping:
prefix_pad: 15
suffix_pad: 0

Key Files
~~~~~~~~~

.. code-block:: text

scripts/reinforcement_learning/rlinf/
├── README.md # Detailed documentation
├── train.py # Training entry point
├── play.py # Evaluation entry point
└── cli_args.py # Shared CLI argument definitions

source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/
├── __init__.py
└── extension.py # Task registration, obs/action conversion

For detailed configuration options, CLI arguments, and how to add new tasks,
see ``scripts/reinforcement_learning/rlinf/README.md``.
9 changes: 9 additions & 0 deletions docs/source/overview/environments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,8 @@ for the lift-cube environment:
+-------------------------+------------------------------+-----------------------------------------------------------------------------+------------------------------+
| |cabi_openarm_uni| | |cabi_openarm_uni-link| | Grasp the handle of a cabinet's drawer and open it with the OpenArm robot | |
+-------------------------+------------------------------+-----------------------------------------------------------------------------+------------------------------+
| |g1_assemble_trocar| | |g1_assemble_trocar-link| | Assemble trocar with a Unitree G1 humanoid robot with Dex3 hands | |
+-------------------------+------------------------------+-----------------------------------------------------------------------------+------------------------------+

.. |reach-franka| image:: ../_static/tasks/manipulation/franka_reach.jpg
.. |reach-ur10| image:: ../_static/tasks/manipulation/ur10_reach.jpg
Expand All @@ -228,6 +230,7 @@ for the lift-cube environment:
.. |reach_openarm_uni| image:: ../_static/tasks/manipulation/openarm_uni_reach.jpg
.. |lift_openarm_uni| image:: ../_static/tasks/manipulation/openarm_uni_lift.jpg
.. |cabi_openarm_uni| image:: ../_static/tasks/manipulation/openarm_uni_open_drawer.jpg
.. |g1_assemble_trocar| image:: ../_static/tasks/manipulation/g1_assemble_trocar.jpg

.. |reach-franka-link| replace:: `Isaac-Reach-Franka-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/reach/config/franka/joint_pos_env_cfg.py>`__
.. |reach-ur10-link| replace:: `Isaac-Reach-UR10-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/reach/config/ur_10/joint_pos_env_cfg.py>`__
Expand Down Expand Up @@ -261,6 +264,7 @@ for the lift-cube environment:
.. |reach_openarm_uni-link| replace:: `Isaac-Reach-OpenArm-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/reach/config/openarm/unimanual/joint_pos_env_cfg.py>`__
.. |lift_openarm_uni-link| replace:: `Isaac-Lift-Cube-OpenArm-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/lift/config/openarm/joint_pos_env_cfg.py>`__
.. |cabi_openarm_uni-link| replace:: `Isaac-Open-Drawer-OpenArm-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/cabinet/config/openarm/joint_pos_env_cfg.py>`__
.. |g1_assemble_trocar-link| replace:: `Isaac-Assemble-Trocar-G129-Dex3-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/g129_dex3_env_cfg.py>`__


Contact-rich Manipulation
Expand Down Expand Up @@ -769,6 +773,11 @@ inferencing, including reading from an already trained checkpoint and disabling
- Manager Based
- **rsl_rl** (PPO), **rl_games** (PPO), **skrl** (PPO), **sb3** (PPO)
- ``newton_mjwarp``, ``physx``
* - Isaac-Assemble-Trocar-G129-Dex3-v0
- Isaac-Assemble-Trocar-G129-Dex3-Eval-v0
- Manager Based
- **rlinf** (PPO)
-
* - Isaac-Cart-Double-Pendulum-Direct-v0
-
- Direct
Expand Down
10 changes: 5 additions & 5 deletions scripts/reinforcement_learning/rlinf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ python train.py
python train.py --config_name isaaclab_ppo_gr00t_assemble_trocar

# Training with task override
python train.py --task Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0
python train.py --task Isaac-Assemble-Trocar-G129-Dex3-v0

# Training with custom settings
python train.py --num_envs 64 --max_epochs 1000
Expand All @@ -94,13 +94,13 @@ python train.py --list_tasks

```bash
# Evaluate a trained checkpoint
python play.py --model_path /path/to/checkpoint
python play.py --task Isaac-Assemble-Trocar-G129-Dex3-Eval-v0 --model_path /path/to/checkpoint

# Evaluate with video recording
python play.py --model_path /path/to/checkpoint --video
python play.py --task Isaac-Assemble-Trocar-G129-Dex3-Eval-v0 --model_path /path/to/checkpoint --video

# Evaluate with specific number of environments
python play.py --model_path /path/to/checkpoint --num_envs 8
python play.py --task Isaac-Assemble-Trocar-G129-Dex3-Eval-v0 --model_path /path/to/checkpoint --num_envs 8
```

## Configuration
Expand Down Expand Up @@ -132,7 +132,7 @@ env:
total_num_envs: 4
max_episode_steps: 256
init_params:
id: "Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0"
id: "Isaac-Assemble-Trocar-G129-Dex3-v0"
isaaclab: &isaaclab_config # IsaacLab ↔ RLinf mapping (see below)
...
eval:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Added
^^^^^

* Added :class:`~isaaclab_assets.robots.unitree.G129_CFG_WITH_DEX3_BASE_FIX` robot configuration
for the Unitree G1 29-DOF with Dex3 hands.
Loading
Loading