isaac-sim · kellyguo11 · May 12, 2026 · Mar 23, 2026 · Mar 24, 2026 · Mar 30, 2026
@@ -9,3 +9,144 @@ Directly integrating such features before they are complete and without feedback
 
 To address this, some major features will be released as Experimental Feature Branches.
 This way, the community can experiment with and contribute to the feature before it's fully integrated, reducing the likelihood of being derailed by unexpected and new errors.
+
+RL Post-Training for VLA Models
+---------------------------------
+
+`RLinf <https://github.com/RLinf/RLinf.git>`_ is a flexible and scalable open-source RL infrastructure designed for
+Embodied and Agentic AI. This integration enables **reinforcement learning fine-tuning of Vision-Language-Action
+(VLA) models** (e.g., GR00T, OpenVLA) on Isaac Lab simulation tasks.
+
+The typical workflow follows three stages:
+
+1. **Data collection** — Collect demonstration data from the Isaac Lab environment (e.g., via teleoperation or scripted policy).
+2. **Base model training** — Train a VLA base model (e.g., GR00T) on the collected demonstrations using supervised learning.
+3. **RL fine-tuning** — Fine-tune the pretrained VLA model on the Isaac Lab task using RLinf with PPO / Actor-Critic / SAC.
+
+Overview
+~~~~~~~~
+
+The RLinf integration allows Isaac Lab users to:
+
+- Fine-tune pretrained VLA models on Isaac Lab tasks using PPO / Actor-Critic / SAC
+- Leverage RLinf's FSDP-based distributed training across multiple GPUs/nodes
+- Define observation/action mappings from Isaac Lab to GR00T format via a single YAML config
+- Register Isaac Lab tasks into RLinf without modifying RLinf source code
+
+Architecture
+~~~~~~~~~~~~
+
+.. code-block:: text
+
+    ┌────────────────────────────────────────────────────────────────┐
+    │                         RLinf Runner                           │
+    │                 (EmbodiedRunner / EvalRunner)                  │
+    ├────────────────┬──────────────────────┬────────────────────────┤
+    │  Actor Worker  │   Rollout Worker     │      Env Worker        │
+    │  (FSDP)        │  (HF Inference)      │  (IsaacLab Sim)        │
+    │                │                      │                        │
+    │ Policy         │  Multi-step rollout  │ IsaacLabGenericEnv     │
+    │ Update         │  with VLA model      │  ├─ _make_env_function │
+    │                │                      │  ├─ _wrap_obs          │
+    │                │                      │  └─ _wrap_action       │
+    └────────────────┴──────────────────────┴────────────────────────┘
+
+**Data flow:**
+
+1. ``EnvWorker`` runs Isaac Lab simulation and converts observations to RLinf format
+2. ``RolloutWorker`` runs VLA model inference (e.g., GR00T) to produce actions
+3. Actions are converted back to Isaac Lab format and stepped in the environment
+4. ``ActorWorker`` updates the VLA model with PPO/actor-critic loss via FSDP
+
+Prerequisites
+~~~~~~~~~~~~~
+
+- **Isaac Lab** installed and configured
+- **Isaac-GR00T** repo (for VLA inference and data transforms)
+- A **pretrained VLA checkpoint** in HuggingFace format
+- Multi-GPU setup recommended (FSDP requires at least 1 GPU)
+
+Installation
+~~~~~~~~~~~~
+
+From the Isaac Lab root directory:
+
+.. code-block:: bash
+
+   # Install isaaclab_contrib with the RLinf extra
+   pip install -e "source/isaaclab_contrib[rlinf]" --ignore-requires-python
+
+   # Install Isaac-GR00T (pinned version)
+   git clone https://github.com/NVIDIA/Isaac-GR00T.git
+   cd Isaac-GR00T
+   git checkout 4af2b622892f7dcb5aae5a3fb70bcb02dc217b96
+   pip install -e .[base] --no-deps
+   cd ../
+
+Quick Start
+~~~~~~~~~~~
+
+**Training** — RL fine-tuning of a pretrained VLA model:
+
+.. code-block:: bash
+
+   python scripts/reinforcement_learning/rlinf/train.py \
+       --task Isaac-Assemble-Trocar-G129-Dex3-v0 \
+       --config_path source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/config \
+       --config_name isaaclab_ppo_gr00t_assemble_trocar
+
+**Evaluation** — Evaluate a trained checkpoint with video recording:
+
+.. code-block:: bash
+
+   python scripts/reinforcement_learning/rlinf/play.py \
+       --task Isaac-Assemble-Trocar-G129-Dex3-Eval-v0 \
+       --model_path /path/to/checkpoint \
+       --config_path source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/config \
+       --config_name isaaclab_ppo_gr00t_assemble_trocar \
+       --video
+
+Configuration
+~~~~~~~~~~~~~
+
+All configuration lives in a **single YAML file** loaded by `Hydra <https://hydra.cc/>`_.
+The key configuration block is the ``env.train.isaaclab`` section, which defines how Isaac Lab observations
+are converted to GR00T format:
+
+.. code-block:: yaml
+
+   isaaclab: &isaaclab_config
+     task_description: "assemble trocar from tray"
+
+     # IsaacLab → RLinf observation mapping
+     main_images: "front_camera"
+     extra_view_images:
+       - "left_wrist_camera"
+       - "right_wrist_camera"
+     states:
+       - key: "robot_joint_state"
+         slice: [15, 29]
+       - key: "robot_dex3_joint_state"
+
+     # GR00T → IsaacLab action conversion
+     action_mapping:
+       prefix_pad: 15
+       suffix_pad: 0
+
+Key Files
+~~~~~~~~~
+
+.. code-block:: text
+
+   scripts/reinforcement_learning/rlinf/
+   ├── README.md          # Detailed documentation
+   ├── train.py           # Training entry point
+   ├── play.py            # Evaluation entry point
+   └── cli_args.py        # Shared CLI argument definitions
+
+   source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/
+   ├── __init__.py
+   └── extension.py       # Task registration, obs/action conversion
+
+For detailed configuration options, CLI arguments, and how to add new tasks,
+see ``scripts/reinforcement_learning/rlinf/README.md``.
@@ -204,6 +204,8 @@ for the lift-cube environment:
     +-------------------------+------------------------------+-----------------------------------------------------------------------------+------------------------------+
     | |cabi_openarm_uni|      | |cabi_openarm_uni-link|      | Grasp the handle of a cabinet's drawer and open it with the OpenArm robot   |                              |
     +-------------------------+------------------------------+-----------------------------------------------------------------------------+------------------------------+
+    | |g1_assemble_trocar|    | |g1_assemble_trocar-link|    | Assemble trocar with a Unitree G1 humanoid robot with Dex3 hands            |                              |
+    +-------------------------+------------------------------+-----------------------------------------------------------------------------+------------------------------+
 
 .. |reach-franka| image:: ../_static/tasks/manipulation/franka_reach.jpg
 .. |reach-ur10| image:: ../_static/tasks/manipulation/ur10_reach.jpg
@@ -228,6 +230,7 @@ for the lift-cube environment:
 .. |reach_openarm_uni| image:: ../_static/tasks/manipulation/openarm_uni_reach.jpg
 .. |lift_openarm_uni| image:: ../_static/tasks/manipulation/openarm_uni_lift.jpg
 .. |cabi_openarm_uni| image:: ../_static/tasks/manipulation/openarm_uni_open_drawer.jpg
+.. |g1_assemble_trocar| image:: ../_static/tasks/manipulation/g1_assemble_trocar.jpg
 
 .. |reach-franka-link| replace:: `Isaac-Reach-Franka-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/reach/config/franka/joint_pos_env_cfg.py>`__
 .. |reach-ur10-link| replace:: `Isaac-Reach-UR10-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/reach/config/ur_10/joint_pos_env_cfg.py>`__
@@ -261,6 +264,7 @@ for the lift-cube environment:
 .. |reach_openarm_uni-link| replace:: `Isaac-Reach-OpenArm-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/reach/config/openarm/unimanual/joint_pos_env_cfg.py>`__
 .. |lift_openarm_uni-link| replace:: `Isaac-Lift-Cube-OpenArm-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/lift/config/openarm/joint_pos_env_cfg.py>`__
 .. |cabi_openarm_uni-link| replace:: `Isaac-Open-Drawer-OpenArm-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/cabinet/config/openarm/joint_pos_env_cfg.py>`__
+.. |g1_assemble_trocar-link| replace:: `Isaac-Assemble-Trocar-G129-Dex3-v0 <../../../source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/g129_dex3_env_cfg.py>`__
 
 
 Contact-rich Manipulation
@@ -769,6 +773,11 @@ inferencing, including reading from an already trained checkpoint and disabling
       - Manager Based
       - **rsl_rl** (PPO), **rl_games** (PPO), **skrl** (PPO), **sb3** (PPO)
       - ``newton_mjwarp``, ``physx``
+    * - Isaac-Assemble-Trocar-G129-Dex3-v0
+      - Isaac-Assemble-Trocar-G129-Dex3-Eval-v0
+      - Manager Based
+      - **rlinf** (PPO)
+      -
     * - Isaac-Cart-Double-Pendulum-Direct-v0
       -
       - Direct

@@ -81,7 +81,7 @@ python train.py
 python train.py --config_name isaaclab_ppo_gr00t_assemble_trocar
 
 # Training with task override
-python train.py --task Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0
+python train.py --task Isaac-Assemble-Trocar-G129-Dex3-v0
 
 # Training with custom settings
 python train.py --num_envs 64 --max_epochs 1000
@@ -94,13 +94,13 @@ python train.py --list_tasks
 
 ```bash
 # Evaluate a trained checkpoint
-python play.py --model_path /path/to/checkpoint
+python play.py --task Isaac-Assemble-Trocar-G129-Dex3-Eval-v0 --model_path /path/to/checkpoint
 
 # Evaluate with video recording
-python play.py --model_path /path/to/checkpoint --video
+python play.py --task Isaac-Assemble-Trocar-G129-Dex3-Eval-v0 --model_path /path/to/checkpoint --video
 
 # Evaluate with specific number of environments
-python play.py --model_path /path/to/checkpoint --num_envs 8
+python play.py --task Isaac-Assemble-Trocar-G129-Dex3-Eval-v0 --model_path /path/to/checkpoint --num_envs 8
 ```
 
 ## Configuration
@@ -132,7 +132,7 @@ env:
     total_num_envs: 4
     max_episode_steps: 256
     init_params:
-      id: "Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0"
+      id: "Isaac-Assemble-Trocar-G129-Dex3-v0"
     isaaclab: &isaaclab_config                # IsaacLab ↔ RLinf mapping (see below)
       ...
   eval:

diff --git a/source/isaaclab_assets/changelog.d/Adds-Assemble-Trocar-task-Based-RLinf.rst b/source/isaaclab_assets/changelog.d/Adds-Assemble-Trocar-task-Based-RLinf.rst
@@ -0,0 +1,5 @@
+Added
+^^^^^
+
+* Added :class:`~isaaclab_assets.robots.unitree.G129_CFG_WITH_DEX3_BASE_FIX` robot configuration
+  for the Unitree G1 29-DOF with Dex3 hands.