Add assemble_trocar task for G129-Dex3 by mingxueg-nv · Pull Request #5082 · isaac-sim/IsaacLab

mingxueg-nv · 2026-03-23T04:12:21Z

Description

Adds the Assemble Trocar manipulation task for the Unitree G1 (29-DoF + Dex3), with RLinf support.

Key additions:

Task MDP: observations (body + Dex3 joint states), reward functions (4-stage sparse), termination conditions (timeout, success, object drop), and reset events (scene reset, task stage reset, random tray rotation).
Camera presets: front camera and left/right wrist cameras (TiledCamera, 224×224) configured for GR00T visual input.
Robot presets: G1 29-DoF + Dex3 articulation configuration.
GR00T data config: IsaacLabDataConfig defining video/state/action modality keys, transforms (SinCos state encoding, min-max action normalization, color jitter), and model-specific settings.
RLinf extension update: minor update to isaaclab_contrib/rl/rlinf/extension.py to support the new task registration.

Type of change

New feature (non-breaking change which adds functionality)

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

greptile-apps · 2026-03-23T04:16:08Z

Greptile Summary

This PR adds the Assemble Trocar manipulation task for the Unitree G1 (29-DoF + Dex3) robot, including a 4-stage sparse MDP (lift → tip-align → insert → place), three TiledCamera presets for GR00T visual input, robot articulation config, a GR00T data config, and a minor extension patch that adds a post-reset render-warmup to fix stale camera frames under RTX rendering.

Key issues found:

Hardcoded local robot USD path (/mnt/Data/lw_v3/… in robot_config.py) — every other asset uses a URL-based USD_ROOT; this will fail on any machine other than the author's workstation.
Sparse reward is 50× the documented value — rewards are emitted as 1 / env.step_dt (= 50.0) on stage transitions, but comments and RewardsCfg describe them as "1.0 per stage".
Stage advancement depends on implicit reward evaluation order — update_task_stage is only called from lift_trocars_reward; the other three reward functions silently assume it runs first each step.
Stale extras after render warmup — _patched_reset re-computes obs after the warmup but returns the original extras, which may contain pre-warmup camera/observation data.
Module-level observation caches (_body_obs_cache, _dex3_obs_cache) keyed only on device+batch-size will silently corrupt data if two env instances coexist in the same process.
trocar_2 is missing collision_props that trocar_1 sets explicitly.

Confidence Score: 2/5

Not ready to merge: the hardcoded local robot USD path will break the task for all users outside the author's machine, and the sparse reward scaling discrepancy needs to be resolved before training results can be reasoned about.
Two P1 issues block merging: (1) the robot USD is at a local absolute path that no other contributor can resolve, making the task unusable out-of-the-box; (2) the sparse reward emits 50× the documented value per stage transition, which could mask training bugs and is inconsistent with the reward-weight comments. Additionally the implicit reward-ordering dependency and stale-extras bug in the extension patch need addressing before the feature is production-reliable.
config/robot_config.py (hardcoded path), mdp/rewards.py (reward scaling and ordering), rl/rlinf/extension.py (stale extras), mdp/observations.py (global cache)

Important Files Changed

Filename	Overview
source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/extension.py	Adds render warmup monkey-patch to env.reset. Stale `extras` returned after re-render is a correctness concern when extras contains observation data.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/config/robot_config.py	Robot USD path is hardcoded to `/mnt/Data/lw_v3/…`, breaking portability for any non-author machine. All other assets use a URL-based path.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/g129_dex3_env_cfg.py	Main environment config assembling scene, observations, rewards, terminations, and events. `trocar_2` lacks collision_props; `USD_ROOT` has an outstanding FIXME; otherwise structurally sound.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/mdp/observations.py	Body and Dex3 joint state observations with preallocated tensor caches. Module-level global caches will interfere if multiple env instances coexist in the same process.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/mdp/rewards.py	4-stage sparse reward functions. Sparse reward value is 1/step_dt (=50) not 1.0 as documented; stage advancement relies on implicit ordering of reward term evaluation.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/mdp/terminations.py	Drop and task-success termination conditions. Logic is clear; imports `get_task_stage` from rewards (coupling that could be tidied but is not a bug).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Episode Start - Stage 0"] -->|"Both trocars lifted above table"| B["Stage 1: Lifted\nlift_trocars_reward fires"]
    B -->|"Tip distance < threshold"| C["Stage 2: Tips Aligned\ntrocar_tip_alignment_reward fires"]
    C -->|"Parallel AND centers close"| D["Stage 3: Inserted\ntrocar_insertion_reward fires"]
    D -->|"Both in target zone AND below z_min"| E["Stage 4: Placed\ntrocar_placement_reward fires"]
    E --> F["task_success_termination - Episode Done"]

    A -->|"Timeout 20s"| G["time_out termination"]
    A -->|"Object z below 0.5m"| H["object_drop_termination"]
    B --> H
    C --> H
    D --> H

    F --> I["Reset Events"]
    G --> I
    H --> I

    I --> R1["reset_scene_to_default"]
    I --> R2["reset_task_stage - clears caches"]
    I --> R3["reset_tray_with_random_rotation"]
    R1 --> A
    R2 --> A
    R3 --> A

Comments Outside Diff (1)

source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/extension.py, line 13-31 (link)

Stale extras returned after render warmup

_patched_reset re-computes obs from env.observation_manager after the warmup renders, but the extras dict that is returned still comes from _original_reset (i.e. before any warmup frames). If extras contains camera or privileged-observation data (as is common in Isaac Lab resets via "observations" or "log" keys), those values will reflect the pre-warmup render rather than the freshly-rendered state that obs reflects. This is an inconsistency between obs and extras that could silently corrupt the first observation seen by the policy after a reset.

Consider also re-computing any camera-dependent fields of extras, or at least documenting that extras intentionally reflects the pre-warmup state.

_{Reviews (1): Last reviewed commit: "Add assemble_trocar task for G129-Dex3 a..." | Re-trigger Greptile}

greptile-apps · 2026-03-23T04:16:12Z

+        stage_just_completed = (env._prev_stage_lift == 0) & (stage >= 1)
+        reward = torch.where(
+            stage_just_completed,
+            torch.ones(env.num_envs, device=env.device) / env.step_dt,
+            torch.zeros(env.num_envs, device=env.device),
+        )
+


Sparse reward is 50× larger than documented

The RewardsCfg docstring and inline comments state each stage gives a reward of 1.0 on completion. However, the actual value emitted is 1.0 / env.step_dt. With decimation=4 and sim.dt=1/200, step_dt = 0.02 s, so the actual reward per transition step is 50.0, not 1.0. The same pattern is repeated in trocar_tip_alignment_reward (line 456), trocar_insertion_reward (line 570), and trocar_placement_reward (line 696).

If the intent is to produce a reward equivalent to "50 reward-units delivered over one policy step" (which is a valid design choice for sparse rewards), the comments and RewardsCfg docstring need to be updated to reflect this. If the intent is to give exactly 1.0, the division by step_dt should be removed.

Suggested change

stage_just_completed = (env._prev_stage_lift == 0) & (stage >= 1)

reward = torch.where(

stage_just_completed,

torch.ones(env.num_envs, device=env.device) / env.step_dt,

torch.zeros(env.num_envs, device=env.device),

)

reward = torch.where(

stage_just_completed,

torch.ones(env.num_envs, device=env.device),

torch.zeros(env.num_envs, device=env.device),

)

greptile-apps · 2026-03-23T04:16:13Z

+def update_task_stage(
+    env: ManagerBasedRLEnv,
+    asset_cfg1: SceneEntityCfg,
+    asset_cfg2: SceneEntityCfg,
+    table_height: float = 0.85483,
+    lift_threshold: float = 0.05,
+    tip_align_threshold: float = 0.015,
+    insertion_dist_threshold: float = 0.03,
+    insertion_angle_threshold: float = 0.15,
+    placement_x_min: float = -1.8,
+    placement_x_max: float = -1.4,
+    placement_y_min: float = 1.5,
+    placement_y_max: float = 1.8,
+    placement_z_min: float = 0.9,
+    print_log: bool = False,
+) -> torch.Tensor:
+    """Update task stage based on current state.
+
+    This function checks conditions and advances stages automatically.
+    Once a stage is completed, it never goes back.
+    """
+    stage = get_task_stage(env)
+
+    obj1: RigidObject = env.scene[asset_cfg1.name]
+    obj2: RigidObject = env.scene[asset_cfg2.name]
+
+    pos1 = wp.to_torch(obj1.data.root_pos_w)
+    pos2 = wp.to_torch(obj2.data.root_pos_w)
+    quat1 = wp.to_torch(obj1.data.root_quat_w)
+    quat2 = wp.to_torch(obj2.data.root_quat_w)
+
+    # Store old stage to detect changes (BEFORE any stage transitions)
+    old_stage = stage.clone()
+
+    # Stage 0 -> 1: Check if lifted
+    target_z = table_height + lift_threshold
+    is_lifted_1 = pos1[:, 2] > target_z
+    is_lifted_2 = pos2[:, 2] > target_z
+    both_lifted = is_lifted_1 & is_lifted_2
+    stage = torch.where((stage == 0) & both_lifted, torch.ones_like(stage), stage)
+
+    # Stage 1 -> 2: Check if tips are aligned (hole found)
+    # Get tip positions
+    tip_pos1 = get_trocar_tip_position(env, asset_cfg1)
+    tip_pos2 = get_trocar_tip_position(env, asset_cfg2)
+    tip_dist = torch.norm(tip_pos1 - tip_pos2, dim=-1)
+
+    # Tip alignment success
+    tip_aligned = tip_dist < tip_align_threshold
+    stage = torch.where((stage == 1) & tip_aligned, torch.full_like(stage, 2), stage)
+
+    # Stage 2 -> 3: Check if inserted (parallel + center close)
+    # Get center distance
+    center_dist = torch.norm(pos1 - pos2, dim=-1)
+
+    # Check alignment
+    target_axis1 = torch.tensor([0.0, 0.0, -1.0], device=env.device).repeat(env.num_envs, 1)
+    target_axis2 = torch.tensor([0.0, 0.0, -1.0], device=env.device).repeat(env.num_envs, 1)
+    axis1 = quat_apply(quat1, target_axis1)
+    axis2 = quat_apply(quat2, target_axis2)
+    dot_prod = torch.sum(axis1 * axis2, dim=-1)
+    abs_dot = torch.clamp(torch.abs(dot_prod), max=1.0)
+    angle = torch.acos(abs_dot)
+
+    # Insertion success: parallel + center close
+    is_parallel = angle < insertion_angle_threshold
+    center_close = center_dist < insertion_dist_threshold
+    is_inserted = is_parallel & center_close
+
+    stage = torch.where((stage == 2) & is_inserted, torch.full_like(stage, 3), stage)
+
+    # Stage 3 -> 4: Check if placed in target zone
+    # Get environment origins to handle multi-env spatial offsets
+    env_origins = env.scene.env_origins  # shape: (num_envs, 3)
+
+    # Adjust target zone relative to each environment's origin
+    curr_x_min = env_origins[:, 0] + min(placement_x_min, placement_x_max)  # (num_envs,)
+    curr_x_max = env_origins[:, 0] + max(placement_x_min, placement_x_max)
+    curr_y_min = env_origins[:, 1] + min(placement_y_min, placement_y_max)
+    curr_y_max = env_origins[:, 1] + max(placement_y_min, placement_y_max)
+
+    in_zone_1 = (
+        (pos1[:, 0] >= curr_x_min)
+        & (pos1[:, 0] <= curr_x_max)
+        & (pos1[:, 1] >= curr_y_min)
+        & (pos1[:, 1] <= curr_y_max)
+        & (pos1[:, 2] < placement_z_min)
+    )
+    in_zone_2 = (
+        (pos2[:, 0] >= curr_x_min)
+        & (pos2[:, 0] <= curr_x_max)
+        & (pos2[:, 1] >= curr_y_min)
+        & (pos2[:, 1] <= curr_y_max)
+        & (pos2[:, 2] < placement_z_min)
+    )
+    both_in_zone = in_zone_1 & in_zone_2
+    stage = torch.where((stage == 3) & both_in_zone, torch.full_like(stage, 4), stage)
+
+    # Print stage transitions (AFTER all stage transitions - always print when stage changes)
+    if print_log and (stage != old_stage).any():
+        for env_id in range(env.num_envs):
+            if stage[env_id] != old_stage[env_id]:
+                print(f"Env {env_id}: Stage {old_stage[env_id].item()} → {stage[env_id].item()}")
+
+    env._task_stage = stage
+    return stage


Stage advancement depends on implicit reward evaluation order

update_task_stage (which advances env._task_stage) is only called from lift_trocars_reward. The three subsequent reward functions — trocar_tip_alignment_reward, trocar_insertion_reward, and trocar_placement_reward — call get_task_stage, which simply reads env._task_stage. This means the stage read by the later rewards is only up-to-date if lift_trocars_reward is evaluated first in every step.

Isaac Lab evaluates reward terms in the order they appear in RewardsCfg, which today happens to match (lift_trocars is declared first). However, this ordering is not enforced by the API, and any reordering of the dict (or framework change) will cause the later rewards to silently use a one-step-stale stage value, producing spurious or missed transition rewards.

Consider factoring update_task_stage into an independent event term (evaluated once per step before rewards) or at least adding a docstring/assertion that guards against incorrect evaluation order.

greptile-apps · 2026-03-23T04:16:18Z

+_body_obs_cache = {
+    "device": None,
+    "batch": None,
+    "idx_t": None,
+    "idx_batch": None,
+    "pos_buf": None,
+    "vel_buf": None,
+    "torque_buf": None,
+    "combined_buf": None,
+}


Module-level observation caches will interfere across multiple environment instances

_body_obs_cache and _dex3_obs_cache are module-level globals keyed only on (device, batch_size). If more than one ManagerBasedRLEnv instance is created in the same process (e.g. training env + eval env, or two parallel environments with different batch sizes), the second instantiation will either evict the first env's cached index tensors and buffers (causing silent wrong results) or — worse — two environments sharing the same batch size will share a single pre-allocated output buffer, causing them to overwrite each other's results.

Consider scoping these caches to the environment instance (e.g. as attributes on env) rather than as module-level globals.

greptile-apps · 2026-03-23T04:16:19Z

+    trocar_2 = RigidObjectCfg(
+        prim_path="/World/envs/env_.*/trocar_2",
+        spawn=UsdFileCfg(
+            usd_path=(
+                f"{USD_ROOT}/LightWheel/Assets/"
+                "DisposableLaparoscopicPunctureDevice001/"
+                "DisposableLaparoscopicPunctureDevice005-xform.usd"
+            ),
+            rigid_props=sim_utils.RigidBodyPropertiesCfg(
+                rigid_body_enabled=True,
+                disable_gravity=False,
+            ),
+        ),
+        init_state=RigidObjectCfg.InitialStateCfg(
+            rot=[-0.71475, -0.000243, 0.05853, 0.69692], pos=[-1.50635, 1.90997, 0.8631]
+        ),
+    )


trocar_2 is missing explicit collision_props

trocar_1 has collision_props explicitly set (collision_enabled=True, contact_offset=0.001, rest_offset=-0.001) but trocar_2 does not. Since both trocars need to interact physically during the insertion stage, inconsistent collision settings could produce unreliable contact detection. Consider adding equivalent collision_props to trocar_2's UsdFileCfg:

spawn=UsdFileCfg( usd_path=(...), rigid_props=sim_utils.RigidBodyPropertiesCfg(...), collision_props=sim_utils.CollisionPropertiesCfg( collision_enabled=True, contact_offset=0.001, rest_offset=-0.001, ), ),

Nic-Ma · 2026-03-25T08:50:14Z

Hi @kellyguo11 , could you please help review this PR?

Thanks in advance.

isaaclab-review-bot · 2026-03-25T20:09:28Z

🤖 Isaac Lab Review Bot

Summary: The trocar assembly task design is solid (4-stage sparse reward with clean stage progression), but there are real bugs and architectural issues that need addressing before merge. Request Changes — 10 findings, 1 runtime crash bug.

Findings (ranked by severity)

🔴 Bug (runtime crash) — rewards.py:620: is_parallel.item() will crash with RuntimeError when num_envs > 1 because it's called on a (num_envs,) tensor. Should be is_parallel[0].item().
🔴 Deprecated API — events.py:139-141,275: data.default_root_state is deprecated since IsaacLab 3.0 and will be removed. Should use default_root_pose / default_root_vel.
🟠 Fragile ordering dependency — rewards.py:239: update_task_stage() is only called from lift_trocars_reward(). The other 3 reward functions silently depend on it running first. If anyone reorders RewardsCfg fields, stage values will be stale.
🟠 Global mutable cache not multi-env safe — observations.py:43: Module-level _body_obs_cache / _dex3_obs_cache dicts will thrash if two env instances exist simultaneously (train + eval).
🟠 Monkey-patching private renderer state — extension.py:484: Directly mutates _rtx_utils._last_render_update_key private module state. Needs at minimum a comment explaining the workaround and ideally should be a supported API.
🟡 Task description mismatch — YAML:196: Eval init_params.task_description says "install trocar from box" but the reused *isaaclab_config anchor says "assemble trocar from tray".
🟡 License header inconsistency — All 10 new files have dual license headers (BSD-3-Clause + Apache-2.0), violating the project's single BSD-3-Clause standard.
🟡 Dead code — reset_robot_to_default_joint_positions is exported but never used in any event config.
🟡 Hard gr00t dependency — gr00t_config.py has top-level imports from gr00t package; needs guard/docs since it's in isaaclab_tasks.
🔵 Per-step tensor allocation — rewards.py:404: tip_offset_local.unsqueeze(0).repeat(num_envs, 1) allocates a new GPU tensor every step unnecessarily. Cache it or use expand().

CI Status: No CI checks reported for this PR.

kellyguo11 · 2026-03-29T05:16:21Z

+                def _patched_reset(*args, **kwargs):
+                    obs, extras = _original_reset(*args, **kwargs)
+                    try:
+                        import isaaclab_physx.renderers.isaac_rtx_renderer_utils as _rtx_utils


why is this patch necessary? this is also restricting the renderer to RTX, is it not possible to use newton renderer as well?

Good question — I agree this is worth scrutinizing.

Looking at the core ManagerBasedEnv.reset() (manager_based_env.py L378-380), it already handles the rerender warmup natively:

if self.sim.has_rtx_sensors() and self.cfg.num_rerenders_on_reset > 0: for _ in range(self.cfg.num_rerenders_on_reset): self.sim.render()

So the framework already re-renders on reset when RTX sensors are present and num_rerenders_on_reset > 0 — and it does so through self.sim.render(), which is renderer-agnostic.

This monkey-patch duplicates that logic but goes directly through RTX-specific internals (_rtx_utils._last_render_update_key), which:

Restricts to RTX — as you noted, Newton renderer users would get no warmup (only the except ImportError: pass path)

Accesses private internals — _last_render_update_key is an implementation detail that could change

Bypasses the framework's own render path — self.sim.render() already handles the render-update-key reset internally

If the built-in rerender-on-reset isn't sufficient for the RLinf use case (e.g., the env.reset() call from gym.make().unwrapped isn't triggering the warmup correctly), it would be better to investigate why and fix it at the framework level rather than patching around it here. @mingxueg-nv — is there a specific failure mode you're seeing that the existing num_rerenders_on_reset doesn't handle?

Hi, @kellyguo11 , during training and evaluation, the first frame of recorded videos shows the robot in its USD default pose instead of the environment's reset pose, the camera observation returned by env.reset() does not get the actual scene state.
I have already tried to set num_rerenders_on_reset, but it doesn't work:

In IsaacLab 2.3, sim.render() called self.forward() + _app.update() to flush data and trigger the full render pipeline.

In IsaacLab 3.0, this was removed, sim.render() now only updates visualizers. But ManagerBasedEnv.reset() still relies on sim.render() for num_rerenders_on_reset, making it a no-op.

I have removed the limit of render to RTX in patch, thanks.

Thanks for the detailed explanation and the code references, @mingxueg-nv — that makes sense now.

I traced through the code and can confirm the issue:

ManagerBasedEnv.reset() (L378-379) calls sim.render() inside the num_rerenders_on_reset loop.

In headless mode, sim.render() hits the NO_RENDERING branch (L613-619) which throttles updates and skips forward() — so the scene data never actually gets flushed to the renderer after reset.

This means the first camera observation still captures the USD default pose rather than the post-reset state.

Your updated patch looks good — using set_setting("/app/player/playSimulations", False) + _app.update() is the same pattern that sim.render() uses internally for full render mode, and the explicit sensor.update(dt=0.0, force_recompute=True) loop ensures cameras pick up the new state. This is now renderer-agnostic too, which addresses my earlier concern. 👍

One minor thought: this feels like it could eventually be a fix upstream in ManagerBasedEnv.reset() itself (e.g., using forward() + _app.update() instead of bare sim.render() when render_mode="rgb_array"), so that all camera-based envs benefit. But as a workaround in the RLinf extension, this is clean and well-scoped.

kellyguo11 · 2026-03-29T05:16:36Z

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+# Licensed under the Apache License, Version 2.0 (the "License");


any reason why we also need the apache license?

Good catch. The rest of the Isaac Lab codebase uses only the BSD-3-Clause header (e.g., lift/config/__init__.py, stack/config/__init__.py, etc.). This file — and several other new files in this PR (rewards.py, observations.py, etc.) — have a dual header with both BSD-3-Clause and Apache-2.0, plus the full Apache license text block.

This looks like it may have been carried over from an internal NVIDIA template. Since this code is being contributed into the Isaac Lab repo, the standard BSD-3-Clause header should be sufficient. The Apache-2.0 block (lines 6-19) can likely be removed from all the new files to stay consistent with the rest of the project.

@kellyguo11 , thanks for reminder, I have removed all duplicated and apache headers from all files.

Confirmed — checked the latest commit and the headers look clean now. __init__.py, camera_config.py, etc. all have just the standard BSD-3-Clause block. Thanks for the quick fix! ✅

kellyguo11 · 2026-03-29T05:17:42Z

+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Copyright (c) 2022-2026, The Isaac Lab Project Developers.


duplicated license headers

Have removed, thanks

Looks good, confirmed the duplicate is gone. Thanks! 👍

isaaclab-review-bot

🤖 Isaac Lab Review Bot — Environment Design Consistency Review

Summary

This PR adds an assemble_trocar manipulation task for the Unitree G129 + Dex3 robot along with an RLinf rendering warmup fix. The task implements a multi-stage surgical trocar assembly: lift → tip-align → insert → place.

Overall assessment: The environment is functional and well-implemented for the task logic, but has significant deviations from established Isaac Lab environment patterns that need to be addressed before merging into isaaclab_tasks. Below is a comprehensive comparison against reference environments (lift, stack, reach, pick_place).

1. Directory Structure

Reference pattern (lift/stack/reach):

task_name/
├── __init__.py           ← empty or minimal, no registrations
├── task_env_cfg.py       ← base/abstract env cfg with MISSING fields
├── mdp/
│   ├── __init__.py       ← uses lazy_export()
│   ├── observations.py
│   ├── rewards.py
│   └── terminations.py
└── config/
    ├── __init__.py       ← empty
    └── robot_name/
        ├── __init__.py   ← gym.register() calls here
        ├── agents/       ← RL agent configs (rsl_rl, skrl, sb3, etc.)
        └── joint_pos_env_cfg.py  ← concrete env cfg

New env structure:

assemble_trocar/
├── __init__.py           ← ❌ gym.register() here (should be in config/g129_dex3/)
├── g129_dex3_env_cfg.py  ← ❌ concrete cfg in task root (should be under config/g129_dex3/)
├── mdp/
│   ├── __init__.py       ← ❌ manual imports (should use lazy_export())
│   ├── events.py         ← ✅ (lift doesn't have this, but it's fine)
│   ├── observations.py   ← ✅
│   ├── rewards.py        ← ✅
│   └── terminations.py   ← ✅
└── config/
    ├── __init__.py       ← ❌ exports robot/camera presets (not registrations)
    ├── camera_config.py  ← ❌ should be in task config, not generic config
    ├── gr00t_config.py   ← ❌ GR00T-specific, has hard external dependency
    ├── isaaclab_ppo_gr00t_assemble_trocar.yaml ← ❌ RLinf training config doesn't belong here
    └── robot_config.py   ← ❌ should use isaaclab_assets or be task-level config

Key deviations:

gym.register() should be in config/g129_dex3/__init__.py, not the task root __init__.py
The concrete env cfg should be under config/g129_dex3/, not the task root
The task root __init__.py should be empty/minimal (like lift/stack)
Missing the base/abstract env cfg pattern (lift has LiftEnvCfg with MISSING fields)
No agents/ directory with standard RL training configs

2. Environment Registration

Reference pattern (Isaac-Lift-Cube-Franka-v0):

# In config/franka/__init__.py
gym.register(
    id="Isaac-Lift-Cube-Franka-v0",
    entry_point="isaaclab.envs:ManagerBasedRLEnv",
    kwargs={
        "env_cfg_entry_point": f"{__name__}.joint_pos_env_cfg:FrankaCubeLiftEnvCfg",  # STRING reference
        "rsl_rl_cfg_entry_point": ...,  # agent configs
        "skrl_cfg_entry_point": ...,
    },
    disable_env_checker=True,
)

New env:

# In __init__.py (wrong location!)
gym.register(
    id="Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0",
    kwargs={
        "env_cfg_entry_point": g129_dex3_env_cfg.G1AssembleTrocarEnvCfg,  # DIRECT CLASS reference
    },
)

Issues:

Registration location: Should be in config/g129_dex3/__init__.py, not the task-level __init__.py
Entry point format: Uses direct class reference instead of string f"{__name__}.module:ClassName". While this works, it's inconsistent with every other env in the repo
Naming convention: Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0 includes "RLinf" — task names should be framework-agnostic (e.g., Isaac-Assemble-Trocar-G129-Dex3-v0)
Missing agent configs: No rsl_rl_cfg_entry_point, skrl_cfg_entry_point, etc. — even if only RLinf is used initially, the structure should support standard Isaac Lab training workflows

3. License Headers

Reference pattern (all Isaac Lab files):

# Copyright (c) 2022-2026, The Isaac Lab Project Developers (...)
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

New env files have DUAL license headers:

# Copyright (c) 2022-2026, The Isaac Lab Project Developers (...)  # ← Isaac Lab header
# SPDX-License-Identifier: BSD-3-Clause

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION      # ← Apache-2.0 header  
# SPDX-License-Identifier: Apache-2.0
# Licensed under the Apache License, Version 2.0 (the "License")...

This creates a license conflict. Files in isaaclab_tasks should use only the BSD-3-Clause header. The Apache-2.0 header should be removed from all files.

4. Scene Configuration

Reference pattern (lift ObjectTableSceneCfg):

@configclass
class ObjectTableSceneCfg(InteractiveSceneCfg):
    robot: ArticulationCfg = MISSING          # ← abstract, set by concrete cfg
    ee_frame: FrameTransformerCfg = MISSING   # ← abstract
    object: RigidObjectCfg = MISSING          # ← abstract
    table = AssetBaseCfg(...)                 # ← concrete
    plane = AssetBaseCfg(prim_path="/World/GroundPlane", ...)  # ← ground plane
    light = AssetBaseCfg(prim_path="/World/light", ...)

New env AssembleTrocarSceneCfg:

@configclass
class AssembleTrocarSceneCfg(InteractiveSceneCfg):
    robot: ArticulationCfg = G1RobotPresets.g1_29dof_dex3_base_fix(...)  # ← concrete, not abstract
    scene = AssetBaseCfg(...)   # ← "scene" is a very generic name
    trocar_1 = RigidObjectCfg(...)
    trocar_2 = RigidObjectCfg(...)
    tray = ArticulationCfg(...)
    light = AssetBaseCfg(...)   # ← no ground plane!

Issues:

No ground plane — All reference envs define a ground plane (/World/GroundPlane). The new env relies on the scene USD having one, but this should be explicit
No abstract base — Robot is hardcoded instead of using MISSING pattern for robot-agnostic task definition
Field naming — scene is a poor name for a scene entity (shadowing the InteractiveSceneCfg concept); should be something descriptive like operating_table or workspace
Camera configs as class attributes — front_camera, left_wrist_camera, right_wrist_camera are set via classmethod calls rather than declarative config instances
tray uses ArticulationCfg with empty actuators — This is a passive prop; RigidObjectCfg would be more semantically correct (or AssetBaseCfg if truly passive)

5. Environment Configuration

Reference pattern (lift LiftEnvCfg):

@configclass
class LiftEnvCfg(ManagerBasedRLEnvCfg):
    scene: ObjectTableSceneCfg = ObjectTableSceneCfg(num_envs=4096, env_spacing=2.5)
    observations: ObservationsCfg = ObservationsCfg()
    actions: ActionsCfg = ActionsCfg()
    commands: CommandsCfg = CommandsCfg()
    rewards: RewardsCfg = RewardsCfg()
    terminations: TerminationsCfg = TerminationsCfg()
    events: EventCfg = EventCfg()        # ← type-annotated
    curriculum: CurriculumCfg = CurriculumCfg()

New env:

@configclass
class G1AssembleTrocarEnvCfg(ManagerBasedRLEnvCfg):
    scene: AssembleTrocarSceneCfg = AssembleTrocarSceneCfg(num_envs=1, ...)
    observations: ObservationsCfg = ObservationsCfg()
    actions: ActionsCfg = ActionsCfg()
    terminations: TerminationsCfg = TerminationsCfg()
    events = EventCfg()     # ← missing type annotation!
    commands = None          # ← None instead of omitting
    rewards: RewardsCfg = RewardsCfg()
    curriculum = None        # ← None instead of omitting

Issues:

events = EventCfg() missing type annotation — Should be events: EventCfg = EventCfg()
num_envs=1 — Reference envs default to 4096. Setting 1 as default seems wrong for batch training
commands = None and curriculum = None — These are already None by default in ManagerBasedRLEnvCfg, so explicitly setting them is unnecessary and clutters the config
episode_length_s = 20.0 — This is set in __post_init__ but MISSING in the base class, so it must be set. This is fine, but notably longer than lift (5.0s)
Render settings in __post_init__ — self.sim.render.enable_translucency, self.sim.render.carb_settings, rendering_mode = "quality", antialiasing_mode = "DLAA" — these are unusual and task-specific. No other reference env sets these. Consider whether these belong in the env config or should be runtime overrides

6. MDP Module (`mdp/`)

Reference pattern (lazy_export()):

# mdp/__init__.py
from isaaclab.utils.module import lazy_export
lazy_export()

New env:

# mdp/__init__.py
from isaaclab.envs.mdp import JointPositionActionCfg, time_out
from .events import ...
from .observations import ...
from .rewards import ...
from .terminations import ...
__all__ = [...]

Issues:

Should use lazy_export() — This is the standard pattern in Isaac Lab mdp modules. It handles star imports and avoids manual __all__ maintenance
Re-exports from isaaclab.envs.mdp — JointPositionActionCfg and time_out should be imported directly where used, not re-exported through the task's mdp package

7. Observations

Reference pattern (lift):

joint_pos = ObsTerm(func=mdp.joint_pos_rel)
joint_vel = ObsTerm(func=mdp.joint_vel_rel)
object_position = ObsTerm(func=mdp.object_position_in_robot_root_frame)

New env:

robot_joint_state = ObsTerm(func=mdp.get_robot_body_joint_states)      # 87 dims (29*3)
robot_dex3_joint_state = ObsTerm(func=mdp.get_robot_dex3_joint_states) # 14 dims

Issues:

Hardcoded joint indices — The observation functions use magic numbers for joint reordering (body_joint_indices = [0, 3, 6, 9, ...]). This is fragile and couples the observation to a specific USD joint order. Consider using joint_names_expr patterns instead
Global mutable state — _body_obs_cache and _dex3_obs_cache are module-level mutable dicts. This breaks if multiple envs with different configurations run in the same process. The cache should be per-environment instance
No object observations — The env doesn't observe trocar positions/orientations in the policy group, only joint states. This means the policy can only use camera images for object state — valid for visuomotor policies, but unusual for the policy observation group
concatenate_terms = False — This is intentional for visuomotor policies but worth documenting explicitly

8. Reward Functions

Reference pattern (lift):

reaching_object = RewTerm(func=mdp.object_ee_distance, params={"std": 0.1}, weight=1.0)
lifting_object = RewTerm(func=mdp.object_is_lifted, params={"minimal_height": 0.04}, weight=15.0)
action_rate = RewTerm(func=mdp.action_rate_l2, weight=-1e-4)

New env (rewards.py — 736 lines!):

Complex multi-stage state machine (_task_stage) stored on env instance
Reward locking mechanism with _lift_reward_locked, _tip_reward_locked, etc.
Debug printing with should_print_debug()
update_task_stage() called inside first reward function (lift_trocars_reward)

Issues:

Reward functions have side effects — lift_trocars_reward() calls update_task_stage() which modifies env._task_stage. The reward manager doesn't guarantee call order, so this is fragile. Stage updates should be in events or a dedicated pre-step hook, not buried in the first reward function
Storing state on env via setattr — env._task_stage, env._prev_stage_*, env._lift_reward_locked, etc. are monkey-patched onto the env instance. This is the wrong pattern — Isaac Lab provides the observation/state manager for this, or custom attributes should be in the env cfg/env class
reward / env.step_dt scaling — In sparse reward mode, rewards are divided by env.step_dt. This is unusual and couples the reward magnitude to the physics timestep
No regularization rewards — Reference envs include action_rate_l2 and joint_vel_l2 penalties. This env has none, which may cause jerky motions
USD stage access in reward functions — get_trocar_tip_position() accesses the USD stage via pxr API to compute tip offsets. This is extremely unusual for reward functions and may have performance implications. The offset should be precomputed in scene setup or the env __init__

9. External Dependencies

config/gr00t_config.py imports:

from gr00t.data.dataset import ModalityConfig
from gr00t.data.transform.base import ComposedModalityTransform
...
from gr00t.experiment.data_config import DATA_CONFIG_MAP, BaseDataConfig
from gr00t.model.transforms import GR00TTransform

Issues:

gr00t is NOT a dependency of isaaclab_tasks — This file will cause import errors for anyone installing Isaac Lab normally. It should either:
- Be moved to isaaclab_contrib (where RLinf lives)
- Be guarded with try/except ImportError
- Not be in the task directory at all
The YAML config file (isaaclab_ppo_gr00t_assemble_trocar.yaml) is RLinf-specific training configuration with hardcoded checkpoint paths (/mnt/ckpt/...). This doesn't belong in isaaclab_tasks

10. RLinf Extension Changes (rendering warmup)

The monkey-patching approach:

_original_reset = env.reset
def _patched_reset(*args, **kwargs):
    obs, extras = _original_reset(*args, **kwargs)
    # force render warmup
    for _ in range(_n_warmup):
        _rtx_utils._last_render_update_key = (0, -1)  # ← private API access
        env.sim.set_setting("/app/player/playSimulations", False)
        _app.update()
        env.sim.set_setting("/app/player/playSimulations", True)
        env.scene.update(dt=0)
    obs = env.observation_manager.compute(update_history=True)
    env.obs_buf = obs
    return obs, extras
env.reset = _patched_reset

Issues:

Accessing private API — _rtx_utils._last_render_update_key = (0, -1) reaches into a private attribute of isaac_rtx_renderer_utils. This will break when the renderer internals change
env.obs_buf = obs override — Directly setting the observation buffer bypasses any safety checks or hooks
The right approach — Isaac Lab already has num_rerenders_on_reset in ManagerBasedEnvCfg. If this doesn't work correctly for the RLinf use case, the fix should be in the core ManagerBasedEnv.reset() method, not monkey-patched from the extension. File an issue or fix it upstream
Why not just set env.cfg.num_rerenders_on_reset? — The env config already supports this. If it's not working correctly, that's a bug to fix in the env, not to patch around

Specific Issues (Numbered)

Dual license headers — All new files have both BSD-3-Clause and Apache-2.0 headers. Remove Apache-2.0 headers
__init__.py has duplicate copyright block — Lines 1-4 and 6-8 are both copyright notices
gym.register() in wrong location — Move to config/g129_dex3/__init__.py
Direct class reference for env_cfg_entry_point — Use string format f"{__name__}.module:Class"
"RLinf" in env name — Remove framework-specific branding from gym ID
Missing agents/ directory — Add standard RL agent configs
events missing type annotation — Add events: EventCfg = EventCfg()
num_envs=1 default — Should be 4096 or at least a reasonable batch size
scene field name in scene cfg — Rename to something descriptive
No ground plane — Add explicit ground plane config
mdp/__init__.py should use lazy_export() — Follow standard pattern
Module-level mutable observation cache — Move to per-env instance
Reward side effects (stage machine) — Move update_task_stage to event system
setattr monkey-patching on env instance — Use proper state management
gr00t_config.py imports unavailable package — Move to isaaclab_contrib or guard imports
RLinf YAML with hardcoded paths — Remove from isaaclab_tasks
USD stage access in reward function — Precompute tip offsets during init
No regularization rewards — Add action_rate and joint_vel penalties
Missing num_rerenders_on_reset in env cfg — Set this instead of monkey-patching in RLinf
tray uses ArticulationCfg with empty actuators — Use RigidObjectCfg or AssetBaseCfg

Recommendations (Priority Order)

P0 — Must fix before merge:

Remove dual license headers; use only BSD-3-Clause
Move gr00t_config.py and isaaclab_ppo_gr00t_assemble_trocar.yaml out of isaaclab_tasks (to isaaclab_contrib or a separate package)
Restructure directory to match standard layout: registrations in config/g129_dex3/__init__.py
Fix env_cfg_entry_point to use string format
Remove "RLinf" from gym env IDs
Set num_rerenders_on_reset in the env cfg instead of monkey-patching reset

P1 — Should fix:
7. Add type annotation for events
8. Use lazy_export() in mdp/__init__.py
9. Move stage state machine out of reward functions into events
10. Add ground plane to scene config
11. Fix default num_envs to a reasonable batch size
12. Add action rate / joint velocity regularization rewards

P2 — Nice to have:
13. Create abstract base env cfg with MISSING robot field
14. Move observation cache from module-level to env instance
15. Precompute trocar tip offsets during env initialization
16. Add agents/ directory with at least one standard training config
17. Rename scene field in AssembleTrocarSceneCfg to something descriptive

kellyguo11

can you also add some documentation for both RLinf and the environment? the environment can be added to https://isaac-sim.github.io/IsaacLab/develop/source/overview/environments.html and maybe RLinf can be added as a new section under https://isaac-sim.github.io/IsaacLab/develop/source/experimental-features/bleeding-edge.html

kellyguo11 · 2026-04-30T00:32:56Z

+
+                import omni.kit.app
+
+                _app = omni.kit.app.get_app()


why is omni.kit.app needed to be explicitly defined here? could we call env.render() or access other APIs from the env?

Hi, @kellyguo11 , I have removed this _patch_reset and set the num_rerenders_on_reset, it works well now, thanks.

kellyguo11 · 2026-04-30T00:34:27Z

+from collections.abc import Sequence
+
+import isaaclab.sim as sim_utils
+from isaaclab.sensors import CameraCfg, TiledCameraCfg


we've deprecated TiledCamera recently, both can just use Camera directly now

Have updated to CameraCfg directly, 🙏

kellyguo11 · 2026-04-30T00:37:16Z

+}
+
+
+G129_CFG_WITH_DEX3_BASE_FIX = ArticulationCfg(


I think this could probably be moved to under isaaclab_assets so that other tasks could also use this robot if they'd like?

I have moved to isaaclab_assets , thanks.

kellyguo11 · 2026-04-30T00:38:20Z

+
+    # Get default states (initial positions from config)
+    # note: default_root_state is the local coordinate relative to the environment origin
+    tray_default_state = wp.to_torch(tray.data.default_root_state)[env_ids].clone()


wp.to_torch is no longer necessary. we've added a new .torch mechanism to improve performance. the wp.to_torch wrapper will still be supported but is now deprecated

Thank you for reminder, I have changed wp.to_torch to .torch .

kellyguo11 · 2026-04-30T00:39:18Z

+    # Precompute and cache column indices
+    global _body_obs_cache
+    if _body_obs_cache["device"] != device or _body_obs_cache["idx_t"] is None:
+        body_joint_indices = [


could this be defined together with the asset instead so that we can keep the mdp implementation clean?

Have cleaned it, thank you.

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Summary

This PR adds a 4-stage trocar assembly task with RLinf integration. The new commits add a _patched_reset function to extension.py that forces a render warmup after reset to fix camera observations. The task implementation is comprehensive but has some structural deviations from Isaac Lab patterns that should be addressed.

Previous Concerns Status

Most structural concerns from the previous review remain unaddressed:

❌ Dual license headers still present in new files
❌ gr00t_config.py with external gr00t dependency still in isaaclab_tasks
❌ gym.register() still in task root __init__.py instead of config/g129_dex3/
❌ Module-level mutable observation caches still present
❌ Stage machine side effects still in reward functions
✅ The "RLinf" has been removed from gym env IDs (now Isaac-Assemble-Trocar-G129-Dex3-v0)

New Findings

🔴 Critical: extension.py:475-488 — Render warmup patch is fragile and has a subtle bug
The _patched_reset implementation has been simplified but still has issues:

def _patched_reset(*args, **kwargs):
    obs, extras = _original_reset(*args, **kwargs)
    env.sim.set_setting("/app/player/playSimulations", False)
    _app.update()
    env.sim.set_setting("/app/player/playSimulations", True)
    for sensor in env.scene.sensors.values():
        sensor.update(dt=0.0, force_recompute=True)
    obs = env.observation_manager.compute(update_history=True)
    env.obs_buf = obs
    return obs, extras

The env.obs_buf = obs line directly mutates internal state. This should use the env's proper API. Also, ManagerBasedRLEnvCfg has rerender_on_reset: bool and num_rerenders_on_reset: int — consider using those instead of monkey-patching.

🟡 Warning: gr00t_config.py:17-23 — Import will fail for standard Isaac Lab installations

from gr00t.data.dataset import ModalityConfig
from gr00t.data.transform.base import ComposedModalityTransform
# ...

This file is in isaaclab_tasks but imports gr00t which is not a dependency. Anyone importing the task package will get ModuleNotFoundError. Either:

Move to isaaclab_contrib/rl/rlinf/
Guard with try/except ImportError
Make it a lazy import

🔵 Improvement: observations.py:28-37 — Module-level cache causes cross-environment contamination

_body_obs_cache = {
    "device": None,
    "batch": None,
    # ...
}

If two envs with different batch sizes run in the same process, the cache will be corrupted. Store the cache on the env instance instead.

Implementation Verdict

Significant concerns — The task logic is solid, but the structural issues (gr00t dependency in isaaclab_tasks, monkey-patching instead of using env config, module-level caches) need to be resolved before merging.

… in rlinf extension Made-with: Cursor

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Follow-up Review

The new commits (b03d761) appear to be identical to the previously reviewed state — no new changes were made to address the previous concerns.

Previous Concerns Status

All previous findings remain unaddressed:

❌ gr00t_config.py with external gr00t dependency still in isaaclab_tasks (will cause ModuleNotFoundError)
❌ Module-level mutable observation caches in observations.py (cross-env contamination risk)
❌ _patched_reset monkey-patch in extension.py instead of using rerender_on_reset config
❌ env.obs_buf = obs direct state mutation

Implementation Verdict

Significant concerns remain — No changes since last review. The critical gr00t import issue in isaaclab_tasks will break standard installations that don't have the gr00t package.

…hange to Cameracfg, and restructure task registration

…raining

mingxueg-nv · 2026-05-07T07:15:37Z

Hi, @kellyguo11 , I have updated this PR and resolved comments. Also, we added the environments.rst and bleeding-edge.rst.

Could you please help to review again? Thank you!

isaaclab-review-bot

Review Summary

This PR adds a well-structured multi-stage surgical manipulation task. The 4-stage state machine (lift → align → insert → place) is a sensible decomposition for this assembly problem. However, there are several architectural concerns that should be addressed before merge.

Strengths

Clear stage-based reward decomposition with both sparse and dense modes
Good use of SceneEntityCfg for parameterizing which assets to track
Caching of tip offsets (computed from USD once, then reused)
Deterministic rotation variant for reproducible evaluation
Camera presets well-organized for GR00T visual input

Concerns

Architecture:

Extensive use of setattr(env, ...) for state — Task stage, reward locks, and debug trackers are all monkey-patched onto the env object. This is fragile, hard to discover, and breaks if two tasks share an env instance or if the env is serialized.
update_task_stage called from every reward function — Each reward term calls update_task_stage with the full parameter set, but only the first call per step actually transitions stages (due to monotonic stage progression). This means the stage thresholds in lift_trocars_reward params silently control ALL stage transitions.
Global mutable caches — _body_obs_cache and _dex3_obs_cache are module-level mutable dicts, which will break with multiple env instances or multiprocessing.

Code quality:
4. print() statements in production code — Debug prints should use logging module, not print().
5. Hardcoded USD paths with hardcoded env_0 — get_trocar_tip_position hardcodes /World/envs/env_0/trocar_1/... for USD stage lookups.
6. No tests — Checklist shows tests unchecked. At minimum, the state machine transitions and reward shaping should have unit tests.

See inline comments for specifics.

isaaclab-review-bot · 2026-05-09T04:31:01Z

+        torch.Tensor: Current stage for each environment (num_envs,)
+    """
+    if not hasattr(env, "_task_stage"):
+        env._task_stage = torch.zeros(env.num_envs, dtype=torch.long, device=env.device)


🔴 State monkey-patched onto env: env._task_stage is set via setattr without any formal registration. This pattern is used throughout this file (_prev_stage_lift, _lift_reward_locked, _tip_reward_locked, etc.).

This has several problems:

No type safety or discoverability — other code can't know these attributes exist

If reset_task_stage in events.py misses one of these attrs, state leaks across episodes

If multiple tasks are loaded, attribute names collide

Recommendation: Use a proper state container registered on the env:

@dataclass class AssembleTrocarState: task_stage: torch.Tensor prev_stage_lift: torch.Tensor lift_reward_locked: torch.Tensor # ... etc # In env setup or first call: if not hasattr(env, 'assemble_trocar_state'): env.assemble_trocar_state = AssembleTrocarState(...)

Or better yet, use Isaac Lab's built-in env.extras dict for task-specific state.

isaaclab-review-bot · 2026-05-09T04:31:01Z

+    pos2 = obj2.data.root_pos_w.torch
+    quat1 = obj1.data.root_quat_w.torch
+    quat2 = obj2.data.root_quat_w.torch
+    # Store old stage to detect changes (BEFORE any stage transitions)


🔴 update_task_stage called redundantly from every reward function: lift_trocars_reward calls update_task_stage with ALL stage transition thresholds, and it's called again (implicitly via get_task_stage) from the other reward functions. Since stages are monotonically increasing, only the first call per step does real work — but the thresholds passed to lift_trocars_reward silently control all stage transitions.

This coupling is confusing and error-prone. If someone changes insertion_dist_threshold in insert_trocars reward params, it won't actually affect the stage transition because lift_trocars_reward already transitioned the stage with its own (potentially different) threshold.

Recommendation: Call update_task_stage once per step as a separate observation/event term, not from within reward functions. The reward functions should only read get_task_stage(env) and compute their reward.

isaaclab-review-bot · 2026-05-09T04:31:01Z

+    Returns:
+        torch.Tensor: Shape (num_envs, 3) - Position in world coordinates
+    """
+    from pxr import Gf, Usd, UsdGeom


🟡 Hardcoded USD paths with env_0: tip_path = "/World/envs/env_0/trocar_1/Trocar002/White_pos" assumes a specific asset hierarchy. If the USD structure changes (e.g., different trocar model), this silently returns torch.zeros(3) with only a print warning.

Also, the from pxr import Gf, Usd, UsdGeom import inside the function body is unusual — it means this function only works when the full Omniverse stack is loaded. Consider:

Moving the import to module level with a TYPE_CHECKING guard

Making the tip prim path configurable (e.g., a parameter in SceneEntityCfg)

Raising an error instead of returning zero offset on missing prim

isaaclab-review-bot · 2026-05-09T04:31:01Z

+    "device": None,
+    "batch": None,
+    "idx_t": None,
+    "idx_batch": None,


🔴 Module-level mutable global state: _body_obs_cache and _dex3_obs_cache are module-level mutable dicts that cache device, batch size, and pre-allocated tensors. This breaks in several scenarios:

Multiple env instances with different batch sizes

Device changes (e.g., moving from CPU to GPU)

Multiprocessing (shared state between workers)

The current code checks device and batch on each call, but if two envs with different batch sizes alternate calls, the cache thrashes.

Recommendation: Move the cache into the env instance (using a registered state object), or use functools.lru_cache keyed on (device, batch). Alternatively, for the simple gather pattern here, just use joint_pos[:, G1_29DOF_BODY_JOINT_INDICES] — PyTorch's fancy indexing is already fast and doesn't need manual caching.

isaaclab-review-bot · 2026-05-09T04:31:01Z

+    pos_buf = _body_obs_cache["pos_buf"]
+    vel_buf = _body_obs_cache["vel_buf"]
+    torque_buf = _body_obs_cache["torque_buf"]
+    combined_buf = _body_obs_cache["combined_buf"]


💡 Over-engineered observation extraction: The torch.gather + pre-allocated buffer pattern adds complexity without clear benefit for this use case. PyTorch's native indexing is highly optimized:

def get_robot_body_joint_states(env: ManagerBasedRLEnv) -> torch.Tensor: robot_data = env.scene["robot"].data idx = G1_29DOF_BODY_JOINT_INDICES return torch.cat([ robot_data.joint_pos.torch[:, idx], robot_data.joint_vel.torch[:, idx], robot_data.applied_torque.torch[:, idx], ], dim=-1)

This is clearer, less error-prone, and likely comparable in performance for 29 joints × 3 quantities.

isaaclab-review-bot · 2026-05-09T04:31:01Z

+    env: ManagerBasedRLEnv,
+    env_ids: torch.Tensor,
+    print_log: bool = False,
+) -> None:


🟡 Fragile state reset: reset_task_stage resets 8+ individual attributes by name. If a new _prev_stage_* or _*_reward_locked attribute is added in rewards.py, this function must be updated in lockstep — there's no enforcement.

This is the natural consequence of the monkey-patching pattern. With a proper state container, reset becomes:

def reset_task_stage(env, env_ids, ...): state = env.assemble_trocar_state for field in dataclasses.fields(state): getattr(state, field.name)[env_ids] = 0

isaaclab-review-bot · 2026-05-09T04:31:01Z

+USD_ROOT = f"{HEALTHCARE_S3}/Props/LightWheel"
+
+
+@configclass


💡 S3 URL for assets: HEALTHCARE_S3 points to an S3 bucket. Is this publicly accessible? If it requires credentials or may change, consider:

Documenting the access requirements

Using Isaac Lab's asset management system (ISAAC_NUCLEUS_DIR or asset retrieval)

Adding a comment about expected availability/versioning

isaaclab-review-bot · 2026-05-09T04:31:01Z

+    CameraPresets,
+    G1RobotPresets,
+)
+


🟡 Module-level joint_names and offset_dict: These 43-element lists pollute the module namespace and are only used by ActionsCfg. Consider moving them inside the class or into the robot config module where they logically belong.

kellyguo11 · 2026-05-09T04:50:24Z

Thanks a lot for addressing the feedback! I do feel like the overall implementation still deviates a bit from how other environments are implemented in Isaac Lab. Combined with the bot review, I think perhaps there are a few places we can try to clean up a bit more:

Task state monkey-patched onto the env object env._task_stage, env._prev_stage_lift, env._lift_reward_locked, etc. are created via setattr(env, ...) inside reward functions. The review bot's suggestion — a @dataclass AssembleTrocarState namespaced under env.assemble_trocar_state could work.
Stage machine runs inside a reward function update_task_stage() is called from lift_trocars_reward() and silently controls all four stage transitions using the thresholds passed to that one reward term. The other three reward functions call get_task_stage() and depend on lift_trocars_reward having already run. Stage updates should be an event term run once per step before rewards, not side-effectful reward logic.
Deprecated API usage data.default_root_state in events.py is deprecated.
use logging instead of print()
mdp/__init__.py still uses manual imports + all rather than lazy_export().
events: EventCfg = EventCfg() was missing a type annotation.
get_trocar_tip_position() hardcodes /World/envs/env_0/... - does this environment work with num_envs > 0? does it always have to be hard-coded to 0? we generally expect our environments to be parallelizable.
tray being defined as an ArticulationCfg, can it actually be a RigidObject instead?

kellyguo11 · 2026-05-09T04:51:03Z

@ooctipus what are your thoughts on this PR?

mingxueg-nv requested a review from Mayankm96 as a code owner March 23, 2026 04:12

github-actions Bot added the isaac-lab Related to Isaac Lab team label Mar 23, 2026

greptile-apps Bot reviewed Mar 23, 2026

View reviewed changes

mingxueg-nv changed the title ~~Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup…~~ Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup for RLinf extension Mar 23, 2026

kellyguo11 reviewed Mar 29, 2026

View reviewed changes

isaaclab-review-bot Bot reviewed Mar 29, 2026

View reviewed changes

kellyguo11 added this to Isaac Lab Apr 17, 2026

kellyguo11 moved this to In review in Isaac Lab Apr 17, 2026

kellyguo11 reviewed Apr 30, 2026

View reviewed changes

isaaclab-review-bot Bot reviewed Apr 30, 2026

View reviewed changes

kellyguo11 moved this from In review to In progress in Isaac Lab May 1, 2026

mingxueg-nv added 3 commits May 5, 2026 07:12

Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup…

bdaca6a

… in rlinf extension Made-with: Cursor

Update assets path

983b43f

Fix format and headers

b03d761

mingxueg-nv force-pushed the Adds_Assemble_Trocar_task_Based_RLinf branch from 3117b97 to b03d761 Compare May 5, 2026 07:12

isaaclab-review-bot Bot reviewed May 5, 2026

View reviewed changes

mingxueg-nv added 3 commits May 5, 2026 21:39

Merge branch 'develop' into Adds_Assemble_Trocar_task_Based_RLinf

d909551

Merge branch 'develop' into Adds_Assemble_Trocar_task_Based_RLinf

8853daa

refactor robot config, migrate to .torch API, remove _patch_reset , c…

d78227d

…hange to Cameracfg, and restructure task registration

mingxueg-nv requested a review from pascal-roth as a code owner May 6, 2026 10:41

github-actions Bot added the asset New asset feature or request label May 6, 2026

mingxueg-nv changed the title ~~Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup for RLinf extension~~ Add assemble_trocar task for G129-Dex3 May 6, 2026

Add documentation for assemble trocar environment and RLinf RL post-t…

e0baa88

…raining

mingxueg-nv requested review from jtigue-bdai and ooctipus as code owners May 7, 2026 03:45

github-actions Bot added the documentation Improvements or additions to documentation label May 7, 2026

add changelog and fix lint

99c6bfa

mingxueg-nv and others added 2 commits May 7, 2026 15:16

Merge branch 'develop' into Adds_Assemble_Trocar_task_Based_RLinf

db4a10e

Merge branch 'develop' into Adds_Assemble_Trocar_task_Based_RLinf

c043474

isaaclab-review-bot Bot reviewed May 9, 2026

View reviewed changes

		}


		G129_CFG_WITH_DEX3_BASE_FIX = ArticulationCfg(

Conversation

mingxueg-nv commented Mar 23, 2026

Description

Type of change

Checklist

Uh oh!

greptile-apps Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

greptile-apps Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Nic-Ma commented Mar 25, 2026

Uh oh!

isaaclab-review-bot Bot commented Mar 25, 2026

🤖 Isaac Lab Review Bot

Findings (ranked by severity)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isaaclab-review-bot Bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

mingxueg-nv Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isaaclab-review-bot Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isaaclab-review-bot Bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isaaclab-review-bot Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isaaclab-review-bot Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot — Environment Design Consistency Review

Summary

1. Directory Structure

2. Environment Registration

3. License Headers

4. Scene Configuration

5. Environment Configuration

6. MDP Module (mdp/)

7. Observations

8. Reward Functions

9. External Dependencies

10. RLinf Extension Changes (rendering warmup)

Specific Issues (Numbered)

Recommendations (Priority Order)

Uh oh!

greptile-apps Bot commented Mar 23, 2026 •

edited

Loading

mingxueg-nv Mar 30, 2026 •

edited

Loading

6. MDP Module (`mdp/`)