Skip to content

Add assemble_trocar task for G129-Dex3#5082

Open
mingxueg-nv wants to merge 10 commits intoisaac-sim:developfrom
mingxueg-nv:Adds_Assemble_Trocar_task_Based_RLinf
Open

Add assemble_trocar task for G129-Dex3#5082
mingxueg-nv wants to merge 10 commits intoisaac-sim:developfrom
mingxueg-nv:Adds_Assemble_Trocar_task_Based_RLinf

Conversation

@mingxueg-nv
Copy link
Copy Markdown
Contributor

Description

Adds the Assemble Trocar manipulation task for the Unitree G1 (29-DoF + Dex3), with RLinf support.

Key additions:

  • Task MDP: observations (body + Dex3 joint states), reward functions (4-stage sparse), termination conditions (timeout, success, object drop), and reset events (scene reset, task stage reset, random tray rotation).
  • Camera presets: front camera and left/right wrist cameras (TiledCamera, 224×224) configured for GR00T visual input.
  • Robot presets: G1 29-DoF + Dex3 articulation configuration.
  • GR00T data config: IsaacLabDataConfig defining video/state/action modality keys, transforms (SinCos state encoding, min-max action normalization, color jitter), and model-specific settings.
  • RLinf extension update: minor update to isaaclab_contrib/rl/rlinf/extension.py to support the new task registration.

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@mingxueg-nv mingxueg-nv requested a review from Mayankm96 as a code owner March 23, 2026 04:12
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label Mar 23, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 23, 2026

Greptile Summary

This PR adds the Assemble Trocar manipulation task for the Unitree G1 (29-DoF + Dex3) robot, including a 4-stage sparse MDP (lift → tip-align → insert → place), three TiledCamera presets for GR00T visual input, robot articulation config, a GR00T data config, and a minor extension patch that adds a post-reset render-warmup to fix stale camera frames under RTX rendering.

Key issues found:

  • Hardcoded local robot USD path (/mnt/Data/lw_v3/… in robot_config.py) — every other asset uses a URL-based USD_ROOT; this will fail on any machine other than the author's workstation.
  • Sparse reward is 50× the documented value — rewards are emitted as 1 / env.step_dt (= 50.0) on stage transitions, but comments and RewardsCfg describe them as "1.0 per stage".
  • Stage advancement depends on implicit reward evaluation orderupdate_task_stage is only called from lift_trocars_reward; the other three reward functions silently assume it runs first each step.
  • Stale extras after render warmup_patched_reset re-computes obs after the warmup but returns the original extras, which may contain pre-warmup camera/observation data.
  • Module-level observation caches (_body_obs_cache, _dex3_obs_cache) keyed only on device+batch-size will silently corrupt data if two env instances coexist in the same process.
  • trocar_2 is missing collision_props that trocar_1 sets explicitly.

Confidence Score: 2/5

  • Not ready to merge: the hardcoded local robot USD path will break the task for all users outside the author's machine, and the sparse reward scaling discrepancy needs to be resolved before training results can be reasoned about.
  • Two P1 issues block merging: (1) the robot USD is at a local absolute path that no other contributor can resolve, making the task unusable out-of-the-box; (2) the sparse reward emits 50× the documented value per stage transition, which could mask training bugs and is inconsistent with the reward-weight comments. Additionally the implicit reward-ordering dependency and stale-extras bug in the extension patch need addressing before the feature is production-reliable.
  • config/robot_config.py (hardcoded path), mdp/rewards.py (reward scaling and ordering), rl/rlinf/extension.py (stale extras), mdp/observations.py (global cache)

Important Files Changed

Filename Overview
source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/extension.py Adds render warmup monkey-patch to env.reset. Stale extras returned after re-render is a correctness concern when extras contains observation data.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/config/robot_config.py Robot USD path is hardcoded to /mnt/Data/lw_v3/…, breaking portability for any non-author machine. All other assets use a URL-based path.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/g129_dex3_env_cfg.py Main environment config assembling scene, observations, rewards, terminations, and events. trocar_2 lacks collision_props; USD_ROOT has an outstanding FIXME; otherwise structurally sound.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/mdp/observations.py Body and Dex3 joint state observations with preallocated tensor caches. Module-level global caches will interfere if multiple env instances coexist in the same process.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/mdp/rewards.py 4-stage sparse reward functions. Sparse reward value is 1/step_dt (=50) not 1.0 as documented; stage advancement relies on implicit ordering of reward term evaluation.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/mdp/terminations.py Drop and task-success termination conditions. Logic is clear; imports get_task_stage from rewards (coupling that could be tidied but is not a bug).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Episode Start - Stage 0"] -->|"Both trocars lifted above table"| B["Stage 1: Lifted\nlift_trocars_reward fires"]
    B -->|"Tip distance < threshold"| C["Stage 2: Tips Aligned\ntrocar_tip_alignment_reward fires"]
    C -->|"Parallel AND centers close"| D["Stage 3: Inserted\ntrocar_insertion_reward fires"]
    D -->|"Both in target zone AND below z_min"| E["Stage 4: Placed\ntrocar_placement_reward fires"]
    E --> F["task_success_termination - Episode Done"]

    A -->|"Timeout 20s"| G["time_out termination"]
    A -->|"Object z below 0.5m"| H["object_drop_termination"]
    B --> H
    C --> H
    D --> H

    F --> I["Reset Events"]
    G --> I
    H --> I

    I --> R1["reset_scene_to_default"]
    I --> R2["reset_task_stage - clears caches"]
    I --> R3["reset_tray_with_random_rotation"]
    R1 --> A
    R2 --> A
    R3 --> A
Loading

Comments Outside Diff (1)

  1. source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/extension.py, line 13-31 (link)

    P2 Stale extras returned after render warmup

    _patched_reset re-computes obs from env.observation_manager after the warmup renders, but the extras dict that is returned still comes from _original_reset (i.e. before any warmup frames). If extras contains camera or privileged-observation data (as is common in Isaac Lab resets via "observations" or "log" keys), those values will reflect the pre-warmup render rather than the freshly-rendered state that obs reflects. This is an inconsistency between obs and extras that could silently corrupt the first observation seen by the policy after a reset.

    Consider also re-computing any camera-dependent fields of extras, or at least documenting that extras intentionally reflects the pre-warmup state.

Reviews (1): Last reviewed commit: "Add assemble_trocar task for G129-Dex3 a..." | Re-trigger Greptile

Comment on lines +276 to +282
stage_just_completed = (env._prev_stage_lift == 0) & (stage >= 1)
reward = torch.where(
stage_just_completed,
torch.ones(env.num_envs, device=env.device) / env.step_dt,
torch.zeros(env.num_envs, device=env.device),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Sparse reward is 50× larger than documented

The RewardsCfg docstring and inline comments state each stage gives a reward of 1.0 on completion. However, the actual value emitted is 1.0 / env.step_dt. With decimation=4 and sim.dt=1/200, step_dt = 0.02 s, so the actual reward per transition step is 50.0, not 1.0. The same pattern is repeated in trocar_tip_alignment_reward (line 456), trocar_insertion_reward (line 570), and trocar_placement_reward (line 696).

If the intent is to produce a reward equivalent to "50 reward-units delivered over one policy step" (which is a valid design choice for sparse rewards), the comments and RewardsCfg docstring need to be updated to reflect this. If the intent is to give exactly 1.0, the division by step_dt should be removed.

Suggested change
stage_just_completed = (env._prev_stage_lift == 0) & (stage >= 1)
reward = torch.where(
stage_just_completed,
torch.ones(env.num_envs, device=env.device) / env.step_dt,
torch.zeros(env.num_envs, device=env.device),
)
reward = torch.where(
stage_just_completed,
torch.ones(env.num_envs, device=env.device),
torch.zeros(env.num_envs, device=env.device),
)

Comment on lines +101 to +206
def update_task_stage(
env: ManagerBasedRLEnv,
asset_cfg1: SceneEntityCfg,
asset_cfg2: SceneEntityCfg,
table_height: float = 0.85483,
lift_threshold: float = 0.05,
tip_align_threshold: float = 0.015,
insertion_dist_threshold: float = 0.03,
insertion_angle_threshold: float = 0.15,
placement_x_min: float = -1.8,
placement_x_max: float = -1.4,
placement_y_min: float = 1.5,
placement_y_max: float = 1.8,
placement_z_min: float = 0.9,
print_log: bool = False,
) -> torch.Tensor:
"""Update task stage based on current state.

This function checks conditions and advances stages automatically.
Once a stage is completed, it never goes back.
"""
stage = get_task_stage(env)

obj1: RigidObject = env.scene[asset_cfg1.name]
obj2: RigidObject = env.scene[asset_cfg2.name]

pos1 = wp.to_torch(obj1.data.root_pos_w)
pos2 = wp.to_torch(obj2.data.root_pos_w)
quat1 = wp.to_torch(obj1.data.root_quat_w)
quat2 = wp.to_torch(obj2.data.root_quat_w)

# Store old stage to detect changes (BEFORE any stage transitions)
old_stage = stage.clone()

# Stage 0 -> 1: Check if lifted
target_z = table_height + lift_threshold
is_lifted_1 = pos1[:, 2] > target_z
is_lifted_2 = pos2[:, 2] > target_z
both_lifted = is_lifted_1 & is_lifted_2
stage = torch.where((stage == 0) & both_lifted, torch.ones_like(stage), stage)

# Stage 1 -> 2: Check if tips are aligned (hole found)
# Get tip positions
tip_pos1 = get_trocar_tip_position(env, asset_cfg1)
tip_pos2 = get_trocar_tip_position(env, asset_cfg2)
tip_dist = torch.norm(tip_pos1 - tip_pos2, dim=-1)

# Tip alignment success
tip_aligned = tip_dist < tip_align_threshold
stage = torch.where((stage == 1) & tip_aligned, torch.full_like(stage, 2), stage)

# Stage 2 -> 3: Check if inserted (parallel + center close)
# Get center distance
center_dist = torch.norm(pos1 - pos2, dim=-1)

# Check alignment
target_axis1 = torch.tensor([0.0, 0.0, -1.0], device=env.device).repeat(env.num_envs, 1)
target_axis2 = torch.tensor([0.0, 0.0, -1.0], device=env.device).repeat(env.num_envs, 1)
axis1 = quat_apply(quat1, target_axis1)
axis2 = quat_apply(quat2, target_axis2)
dot_prod = torch.sum(axis1 * axis2, dim=-1)
abs_dot = torch.clamp(torch.abs(dot_prod), max=1.0)
angle = torch.acos(abs_dot)

# Insertion success: parallel + center close
is_parallel = angle < insertion_angle_threshold
center_close = center_dist < insertion_dist_threshold
is_inserted = is_parallel & center_close

stage = torch.where((stage == 2) & is_inserted, torch.full_like(stage, 3), stage)

# Stage 3 -> 4: Check if placed in target zone
# Get environment origins to handle multi-env spatial offsets
env_origins = env.scene.env_origins # shape: (num_envs, 3)

# Adjust target zone relative to each environment's origin
curr_x_min = env_origins[:, 0] + min(placement_x_min, placement_x_max) # (num_envs,)
curr_x_max = env_origins[:, 0] + max(placement_x_min, placement_x_max)
curr_y_min = env_origins[:, 1] + min(placement_y_min, placement_y_max)
curr_y_max = env_origins[:, 1] + max(placement_y_min, placement_y_max)

in_zone_1 = (
(pos1[:, 0] >= curr_x_min)
& (pos1[:, 0] <= curr_x_max)
& (pos1[:, 1] >= curr_y_min)
& (pos1[:, 1] <= curr_y_max)
& (pos1[:, 2] < placement_z_min)
)
in_zone_2 = (
(pos2[:, 0] >= curr_x_min)
& (pos2[:, 0] <= curr_x_max)
& (pos2[:, 1] >= curr_y_min)
& (pos2[:, 1] <= curr_y_max)
& (pos2[:, 2] < placement_z_min)
)
both_in_zone = in_zone_1 & in_zone_2
stage = torch.where((stage == 3) & both_in_zone, torch.full_like(stage, 4), stage)

# Print stage transitions (AFTER all stage transitions - always print when stage changes)
if print_log and (stage != old_stage).any():
for env_id in range(env.num_envs):
if stage[env_id] != old_stage[env_id]:
print(f"Env {env_id}: Stage {old_stage[env_id].item()} → {stage[env_id].item()}")

env._task_stage = stage
return stage
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stage advancement depends on implicit reward evaluation order

update_task_stage (which advances env._task_stage) is only called from lift_trocars_reward. The three subsequent reward functions — trocar_tip_alignment_reward, trocar_insertion_reward, and trocar_placement_reward — call get_task_stage, which simply reads env._task_stage. This means the stage read by the later rewards is only up-to-date if lift_trocars_reward is evaluated first in every step.

Isaac Lab evaluates reward terms in the order they appear in RewardsCfg, which today happens to match (lift_trocars is declared first). However, this ordering is not enforced by the API, and any reordering of the dict (or framework change) will cause the later rewards to silently use a one-step-stale stage value, producing spurious or missed transition rewards.

Consider factoring update_task_stage into an independent event term (evaluated once per step before rewards) or at least adding a docstring/assertion that guards against incorrect evaluation order.

Comment on lines +40 to +49
_body_obs_cache = {
"device": None,
"batch": None,
"idx_t": None,
"idx_batch": None,
"pos_buf": None,
"vel_buf": None,
"torque_buf": None,
"combined_buf": None,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Module-level observation caches will interfere across multiple environment instances

_body_obs_cache and _dex3_obs_cache are module-level globals keyed only on (device, batch_size). If more than one ManagerBasedRLEnv instance is created in the same process (e.g. training env + eval env, or two parallel environments with different batch sizes), the second instantiation will either evict the first env's cached index tensors and buffers (causing silent wrong results) or — worse — two environments sharing the same batch size will share a single pre-allocated output buffer, causing them to overwrite each other's results.

Consider scoping these caches to the environment instance (e.g. as attributes on env) rather than as module-level globals.

Comment on lines +135 to +151
trocar_2 = RigidObjectCfg(
prim_path="/World/envs/env_.*/trocar_2",
spawn=UsdFileCfg(
usd_path=(
f"{USD_ROOT}/LightWheel/Assets/"
"DisposableLaparoscopicPunctureDevice001/"
"DisposableLaparoscopicPunctureDevice005-xform.usd"
),
rigid_props=sim_utils.RigidBodyPropertiesCfg(
rigid_body_enabled=True,
disable_gravity=False,
),
),
init_state=RigidObjectCfg.InitialStateCfg(
rot=[-0.71475, -0.000243, 0.05853, 0.69692], pos=[-1.50635, 1.90997, 0.8631]
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 trocar_2 is missing explicit collision_props

trocar_1 has collision_props explicitly set (collision_enabled=True, contact_offset=0.001, rest_offset=-0.001) but trocar_2 does not. Since both trocars need to interact physically during the insertion stage, inconsistent collision settings could produce unreliable contact detection. Consider adding equivalent collision_props to trocar_2's UsdFileCfg:

spawn=UsdFileCfg(
    usd_path=(...),
    rigid_props=sim_utils.RigidBodyPropertiesCfg(...),
    collision_props=sim_utils.CollisionPropertiesCfg(
        collision_enabled=True,
        contact_offset=0.001,
        rest_offset=-0.001,
    ),
),

@mingxueg-nv mingxueg-nv changed the title Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup… Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup for RLinf extension Mar 23, 2026
@Nic-Ma
Copy link
Copy Markdown

Nic-Ma commented Mar 25, 2026

Hi @kellyguo11 , could you please help review this PR?

Thanks in advance.

@isaaclab-review-bot
Copy link
Copy Markdown

🤖 Isaac Lab Review Bot

Summary: The trocar assembly task design is solid (4-stage sparse reward with clean stage progression), but there are real bugs and architectural issues that need addressing before merge. Request Changes — 10 findings, 1 runtime crash bug.

Findings (ranked by severity)

  1. 🔴 Bug (runtime crash)rewards.py:620: is_parallel.item() will crash with RuntimeError when num_envs > 1 because it's called on a (num_envs,) tensor. Should be is_parallel[0].item().

  2. 🔴 Deprecated APIevents.py:139-141,275: data.default_root_state is deprecated since IsaacLab 3.0 and will be removed. Should use default_root_pose / default_root_vel.

  3. 🟠 Fragile ordering dependencyrewards.py:239: update_task_stage() is only called from lift_trocars_reward(). The other 3 reward functions silently depend on it running first. If anyone reorders RewardsCfg fields, stage values will be stale.

  4. 🟠 Global mutable cache not multi-env safeobservations.py:43: Module-level _body_obs_cache / _dex3_obs_cache dicts will thrash if two env instances exist simultaneously (train + eval).

  5. 🟠 Monkey-patching private renderer stateextension.py:484: Directly mutates _rtx_utils._last_render_update_key private module state. Needs at minimum a comment explaining the workaround and ideally should be a supported API.

  6. 🟡 Task description mismatch — YAML:196: Eval init_params.task_description says "install trocar from box" but the reused *isaaclab_config anchor says "assemble trocar from tray".

  7. 🟡 License header inconsistency — All 10 new files have dual license headers (BSD-3-Clause + Apache-2.0), violating the project's single BSD-3-Clause standard.

  8. 🟡 Dead codereset_robot_to_default_joint_positions is exported but never used in any event config.

  9. 🟡 Hard gr00t dependencygr00t_config.py has top-level imports from gr00t package; needs guard/docs since it's in isaaclab_tasks.

  10. 🔵 Per-step tensor allocationrewards.py:404: tip_offset_local.unsqueeze(0).repeat(num_envs, 1) allocates a new GPU tensor every step unnecessarily. Cache it or use expand().

CI Status: No CI checks reported for this PR.

def _patched_reset(*args, **kwargs):
obs, extras = _original_reset(*args, **kwargs)
try:
import isaaclab_physx.renderers.isaac_rtx_renderer_utils as _rtx_utils
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this patch necessary? this is also restricting the renderer to RTX, is it not possible to use newton renderer as well?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question — I agree this is worth scrutinizing.

Looking at the core ManagerBasedEnv.reset() (manager_based_env.py L378-380), it already handles the rerender warmup natively:

if self.sim.has_rtx_sensors() and self.cfg.num_rerenders_on_reset > 0:
    for _ in range(self.cfg.num_rerenders_on_reset):
        self.sim.render()

So the framework already re-renders on reset when RTX sensors are present and num_rerenders_on_reset > 0 — and it does so through self.sim.render(), which is renderer-agnostic.

This monkey-patch duplicates that logic but goes directly through RTX-specific internals (_rtx_utils._last_render_update_key), which:

  1. Restricts to RTX — as you noted, Newton renderer users would get no warmup (only the except ImportError: pass path)
  2. Accesses private internals_last_render_update_key is an implementation detail that could change
  3. Bypasses the framework's own render pathself.sim.render() already handles the render-update-key reset internally

If the built-in rerender-on-reset isn't sufficient for the RLinf use case (e.g., the env.reset() call from gym.make().unwrapped isn't triggering the warmup correctly), it would be better to investigate why and fix it at the framework level rather than patching around it here. @mingxueg-nv — is there a specific failure mode you're seeing that the existing num_rerenders_on_reset doesn't handle?

Copy link
Copy Markdown
Contributor Author

@mingxueg-nv mingxueg-nv Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kellyguo11 , during training and evaluation, the first frame of recorded videos shows the robot in its USD default pose instead of the environment's reset pose, the camera observation returned by env.reset() does not get the actual scene state.
I have already tried to set num_rerenders_on_reset, but it doesn't work:

  • In IsaacLab 2.3, sim.render() called self.forward() + _app.update() to flush data and trigger the full render pipeline.

  • In IsaacLab 3.0, this was removed, sim.render() now only updates visualizers. But ManagerBasedEnv.reset() still relies on sim.render() for num_rerenders_on_reset, making it a no-op.

I have removed the limit of render to RTX in patch, thanks.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation and the code references, @mingxueg-nv — that makes sense now.

I traced through the code and can confirm the issue:

  1. ManagerBasedEnv.reset() (L378-379) calls sim.render() inside the num_rerenders_on_reset loop.
  2. In headless mode, sim.render() hits the NO_RENDERING branch (L613-619) which throttles updates and skips forward() — so the scene data never actually gets flushed to the renderer after reset.
  3. This means the first camera observation still captures the USD default pose rather than the post-reset state.

Your updated patch looks good — using set_setting("/app/player/playSimulations", False) + _app.update() is the same pattern that sim.render() uses internally for full render mode, and the explicit sensor.update(dt=0.0, force_recompute=True) loop ensures cameras pick up the new state. This is now renderer-agnostic too, which addresses my earlier concern. 👍

One minor thought: this feels like it could eventually be a fix upstream in ManagerBasedEnv.reset() itself (e.g., using forward() + _app.update() instead of bare sim.render() when render_mode="rgb_array"), so that all camera-based envs benefit. But as a workaround in the RLinf extension, this is clean and well-scoped.

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why we also need the apache license?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The rest of the Isaac Lab codebase uses only the BSD-3-Clause header (e.g., lift/config/__init__.py, stack/config/__init__.py, etc.). This file — and several other new files in this PR (rewards.py, observations.py, etc.) — have a dual header with both BSD-3-Clause and Apache-2.0, plus the full Apache license text block.

This looks like it may have been carried over from an internal NVIDIA template. Since this code is being contributed into the Isaac Lab repo, the standard BSD-3-Clause header should be sufficient. The Apache-2.0 block (lines 6-19) can likely be removed from all the new files to stay consistent with the rest of the project.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kellyguo11 , thanks for reminder, I have removed all duplicated and apache headers from all files.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — checked the latest commit and the headers look clean now. __init__.py, camera_config.py, etc. all have just the standard BSD-3-Clause block. Thanks for the quick fix! ✅

#
# SPDX-License-Identifier: BSD-3-Clause

# Copyright (c) 2022-2026, The Isaac Lab Project Developers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated license headers

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed, thanks

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, confirmed the duplicate is gone. Thanks! 👍

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot — Environment Design Consistency Review

Summary

This PR adds an assemble_trocar manipulation task for the Unitree G129 + Dex3 robot along with an RLinf rendering warmup fix. The task implements a multi-stage surgical trocar assembly: lift → tip-align → insert → place.

Overall assessment: The environment is functional and well-implemented for the task logic, but has significant deviations from established Isaac Lab environment patterns that need to be addressed before merging into isaaclab_tasks. Below is a comprehensive comparison against reference environments (lift, stack, reach, pick_place).


1. Directory Structure

Reference pattern (lift/stack/reach):

task_name/
├── __init__.py           ← empty or minimal, no registrations
├── task_env_cfg.py       ← base/abstract env cfg with MISSING fields
├── mdp/
│   ├── __init__.py       ← uses lazy_export()
│   ├── observations.py
│   ├── rewards.py
│   └── terminations.py
└── config/
    ├── __init__.py       ← empty
    └── robot_name/
        ├── __init__.py   ← gym.register() calls here
        ├── agents/       ← RL agent configs (rsl_rl, skrl, sb3, etc.)
        └── joint_pos_env_cfg.py  ← concrete env cfg

New env structure:

assemble_trocar/
├── __init__.py           ← ❌ gym.register() here (should be in config/g129_dex3/)
├── g129_dex3_env_cfg.py  ← ❌ concrete cfg in task root (should be under config/g129_dex3/)
├── mdp/
│   ├── __init__.py       ← ❌ manual imports (should use lazy_export())
│   ├── events.py         ← ✅ (lift doesn't have this, but it's fine)
│   ├── observations.py   ← ✅
│   ├── rewards.py        ← ✅
│   └── terminations.py   ← ✅
└── config/
    ├── __init__.py       ← ❌ exports robot/camera presets (not registrations)
    ├── camera_config.py  ← ❌ should be in task config, not generic config
    ├── gr00t_config.py   ← ❌ GR00T-specific, has hard external dependency
    ├── isaaclab_ppo_gr00t_assemble_trocar.yaml ← ❌ RLinf training config doesn't belong here
    └── robot_config.py   ← ❌ should use isaaclab_assets or be task-level config

Key deviations:

  1. gym.register() should be in config/g129_dex3/__init__.py, not the task root __init__.py
  2. The concrete env cfg should be under config/g129_dex3/, not the task root
  3. The task root __init__.py should be empty/minimal (like lift/stack)
  4. Missing the base/abstract env cfg pattern (lift has LiftEnvCfg with MISSING fields)
  5. No agents/ directory with standard RL training configs

2. Environment Registration

Reference pattern (Isaac-Lift-Cube-Franka-v0):

# In config/franka/__init__.py
gym.register(
    id="Isaac-Lift-Cube-Franka-v0",
    entry_point="isaaclab.envs:ManagerBasedRLEnv",
    kwargs={
        "env_cfg_entry_point": f"{__name__}.joint_pos_env_cfg:FrankaCubeLiftEnvCfg",  # STRING reference
        "rsl_rl_cfg_entry_point": ...,  # agent configs
        "skrl_cfg_entry_point": ...,
    },
    disable_env_checker=True,
)

New env:

# In __init__.py (wrong location!)
gym.register(
    id="Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0",
    kwargs={
        "env_cfg_entry_point": g129_dex3_env_cfg.G1AssembleTrocarEnvCfg,  # DIRECT CLASS reference
    },
)

Issues:

  1. Registration location: Should be in config/g129_dex3/__init__.py, not the task-level __init__.py
  2. Entry point format: Uses direct class reference instead of string f"{__name__}.module:ClassName". While this works, it's inconsistent with every other env in the repo
  3. Naming convention: Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0 includes "RLinf" — task names should be framework-agnostic (e.g., Isaac-Assemble-Trocar-G129-Dex3-v0)
  4. Missing agent configs: No rsl_rl_cfg_entry_point, skrl_cfg_entry_point, etc. — even if only RLinf is used initially, the structure should support standard Isaac Lab training workflows

3. License Headers

Reference pattern (all Isaac Lab files):

# Copyright (c) 2022-2026, The Isaac Lab Project Developers (...)
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

New env files have DUAL license headers:

# Copyright (c) 2022-2026, The Isaac Lab Project Developers (...)  # ← Isaac Lab header
# SPDX-License-Identifier: BSD-3-Clause

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION      # ← Apache-2.0 header  
# SPDX-License-Identifier: Apache-2.0
# Licensed under the Apache License, Version 2.0 (the "License")...

This creates a license conflict. Files in isaaclab_tasks should use only the BSD-3-Clause header. The Apache-2.0 header should be removed from all files.


4. Scene Configuration

Reference pattern (lift ObjectTableSceneCfg):

@configclass
class ObjectTableSceneCfg(InteractiveSceneCfg):
    robot: ArticulationCfg = MISSING          # ← abstract, set by concrete cfg
    ee_frame: FrameTransformerCfg = MISSING   # ← abstract
    object: RigidObjectCfg = MISSING          # ← abstract
    table = AssetBaseCfg(...)                 # ← concrete
    plane = AssetBaseCfg(prim_path="/World/GroundPlane", ...)  # ← ground plane
    light = AssetBaseCfg(prim_path="/World/light", ...)

New env AssembleTrocarSceneCfg:

@configclass
class AssembleTrocarSceneCfg(InteractiveSceneCfg):
    robot: ArticulationCfg = G1RobotPresets.g1_29dof_dex3_base_fix(...)  # ← concrete, not abstract
    scene = AssetBaseCfg(...)   # ← "scene" is a very generic name
    trocar_1 = RigidObjectCfg(...)
    trocar_2 = RigidObjectCfg(...)
    tray = ArticulationCfg(...)
    light = AssetBaseCfg(...)   # ← no ground plane!

Issues:

  1. No ground plane — All reference envs define a ground plane (/World/GroundPlane). The new env relies on the scene USD having one, but this should be explicit
  2. No abstract base — Robot is hardcoded instead of using MISSING pattern for robot-agnostic task definition
  3. Field namingscene is a poor name for a scene entity (shadowing the InteractiveSceneCfg concept); should be something descriptive like operating_table or workspace
  4. Camera configs as class attributesfront_camera, left_wrist_camera, right_wrist_camera are set via classmethod calls rather than declarative config instances
  5. tray uses ArticulationCfg with empty actuators — This is a passive prop; RigidObjectCfg would be more semantically correct (or AssetBaseCfg if truly passive)

5. Environment Configuration

Reference pattern (lift LiftEnvCfg):

@configclass
class LiftEnvCfg(ManagerBasedRLEnvCfg):
    scene: ObjectTableSceneCfg = ObjectTableSceneCfg(num_envs=4096, env_spacing=2.5)
    observations: ObservationsCfg = ObservationsCfg()
    actions: ActionsCfg = ActionsCfg()
    commands: CommandsCfg = CommandsCfg()
    rewards: RewardsCfg = RewardsCfg()
    terminations: TerminationsCfg = TerminationsCfg()
    events: EventCfg = EventCfg()        # ← type-annotated
    curriculum: CurriculumCfg = CurriculumCfg()

New env:

@configclass
class G1AssembleTrocarEnvCfg(ManagerBasedRLEnvCfg):
    scene: AssembleTrocarSceneCfg = AssembleTrocarSceneCfg(num_envs=1, ...)
    observations: ObservationsCfg = ObservationsCfg()
    actions: ActionsCfg = ActionsCfg()
    terminations: TerminationsCfg = TerminationsCfg()
    events = EventCfg()     # ← missing type annotation!
    commands = None          # ← None instead of omitting
    rewards: RewardsCfg = RewardsCfg()
    curriculum = None        # ← None instead of omitting

Issues:

  1. events = EventCfg() missing type annotation — Should be events: EventCfg = EventCfg()
  2. num_envs=1 — Reference envs default to 4096. Setting 1 as default seems wrong for batch training
  3. commands = None and curriculum = None — These are already None by default in ManagerBasedRLEnvCfg, so explicitly setting them is unnecessary and clutters the config
  4. episode_length_s = 20.0 — This is set in __post_init__ but MISSING in the base class, so it must be set. This is fine, but notably longer than lift (5.0s)
  5. Render settings in __post_init__self.sim.render.enable_translucency, self.sim.render.carb_settings, rendering_mode = "quality", antialiasing_mode = "DLAA" — these are unusual and task-specific. No other reference env sets these. Consider whether these belong in the env config or should be runtime overrides

6. MDP Module (mdp/)

Reference pattern (lazy_export()):

# mdp/__init__.py
from isaaclab.utils.module import lazy_export
lazy_export()

New env:

# mdp/__init__.py
from isaaclab.envs.mdp import JointPositionActionCfg, time_out
from .events import ...
from .observations import ...
from .rewards import ...
from .terminations import ...
__all__ = [...]

Issues:

  1. Should use lazy_export() — This is the standard pattern in Isaac Lab mdp modules. It handles star imports and avoids manual __all__ maintenance
  2. Re-exports from isaaclab.envs.mdpJointPositionActionCfg and time_out should be imported directly where used, not re-exported through the task's mdp package

7. Observations

Reference pattern (lift):

joint_pos = ObsTerm(func=mdp.joint_pos_rel)
joint_vel = ObsTerm(func=mdp.joint_vel_rel)
object_position = ObsTerm(func=mdp.object_position_in_robot_root_frame)

New env:

robot_joint_state = ObsTerm(func=mdp.get_robot_body_joint_states)      # 87 dims (29*3)
robot_dex3_joint_state = ObsTerm(func=mdp.get_robot_dex3_joint_states) # 14 dims

Issues:

  1. Hardcoded joint indices — The observation functions use magic numbers for joint reordering (body_joint_indices = [0, 3, 6, 9, ...]). This is fragile and couples the observation to a specific USD joint order. Consider using joint_names_expr patterns instead
  2. Global mutable state_body_obs_cache and _dex3_obs_cache are module-level mutable dicts. This breaks if multiple envs with different configurations run in the same process. The cache should be per-environment instance
  3. No object observations — The env doesn't observe trocar positions/orientations in the policy group, only joint states. This means the policy can only use camera images for object state — valid for visuomotor policies, but unusual for the policy observation group
  4. concatenate_terms = False — This is intentional for visuomotor policies but worth documenting explicitly

8. Reward Functions

Reference pattern (lift):

reaching_object = RewTerm(func=mdp.object_ee_distance, params={"std": 0.1}, weight=1.0)
lifting_object = RewTerm(func=mdp.object_is_lifted, params={"minimal_height": 0.04}, weight=15.0)
action_rate = RewTerm(func=mdp.action_rate_l2, weight=-1e-4)

New env (rewards.py — 736 lines!):

  • Complex multi-stage state machine (_task_stage) stored on env instance
  • Reward locking mechanism with _lift_reward_locked, _tip_reward_locked, etc.
  • Debug printing with should_print_debug()
  • update_task_stage() called inside first reward function (lift_trocars_reward)

Issues:

  1. Reward functions have side effectslift_trocars_reward() calls update_task_stage() which modifies env._task_stage. The reward manager doesn't guarantee call order, so this is fragile. Stage updates should be in events or a dedicated pre-step hook, not buried in the first reward function
  2. Storing state on env via setattrenv._task_stage, env._prev_stage_*, env._lift_reward_locked, etc. are monkey-patched onto the env instance. This is the wrong pattern — Isaac Lab provides the observation/state manager for this, or custom attributes should be in the env cfg/env class
  3. reward / env.step_dt scaling — In sparse reward mode, rewards are divided by env.step_dt. This is unusual and couples the reward magnitude to the physics timestep
  4. No regularization rewards — Reference envs include action_rate_l2 and joint_vel_l2 penalties. This env has none, which may cause jerky motions
  5. USD stage access in reward functionsget_trocar_tip_position() accesses the USD stage via pxr API to compute tip offsets. This is extremely unusual for reward functions and may have performance implications. The offset should be precomputed in scene setup or the env __init__

9. External Dependencies

config/gr00t_config.py imports:

from gr00t.data.dataset import ModalityConfig
from gr00t.data.transform.base import ComposedModalityTransform
...
from gr00t.experiment.data_config import DATA_CONFIG_MAP, BaseDataConfig
from gr00t.model.transforms import GR00TTransform

Issues:

  1. gr00t is NOT a dependency of isaaclab_tasks — This file will cause import errors for anyone installing Isaac Lab normally. It should either:
    • Be moved to isaaclab_contrib (where RLinf lives)
    • Be guarded with try/except ImportError
    • Not be in the task directory at all
  2. The YAML config file (isaaclab_ppo_gr00t_assemble_trocar.yaml) is RLinf-specific training configuration with hardcoded checkpoint paths (/mnt/ckpt/...). This doesn't belong in isaaclab_tasks

10. RLinf Extension Changes (rendering warmup)

The monkey-patching approach:

_original_reset = env.reset
def _patched_reset(*args, **kwargs):
    obs, extras = _original_reset(*args, **kwargs)
    # force render warmup
    for _ in range(_n_warmup):
        _rtx_utils._last_render_update_key = (0, -1)  # ← private API access
        env.sim.set_setting("/app/player/playSimulations", False)
        _app.update()
        env.sim.set_setting("/app/player/playSimulations", True)
        env.scene.update(dt=0)
    obs = env.observation_manager.compute(update_history=True)
    env.obs_buf = obs
    return obs, extras
env.reset = _patched_reset

Issues:

  1. Accessing private API_rtx_utils._last_render_update_key = (0, -1) reaches into a private attribute of isaac_rtx_renderer_utils. This will break when the renderer internals change
  2. env.obs_buf = obs override — Directly setting the observation buffer bypasses any safety checks or hooks
  3. The right approach — Isaac Lab already has num_rerenders_on_reset in ManagerBasedEnvCfg. If this doesn't work correctly for the RLinf use case, the fix should be in the core ManagerBasedEnv.reset() method, not monkey-patched from the extension. File an issue or fix it upstream
  4. Why not just set env.cfg.num_rerenders_on_reset? — The env config already supports this. If it's not working correctly, that's a bug to fix in the env, not to patch around

Specific Issues (Numbered)

  1. Dual license headers — All new files have both BSD-3-Clause and Apache-2.0 headers. Remove Apache-2.0 headers
  2. __init__.py has duplicate copyright block — Lines 1-4 and 6-8 are both copyright notices
  3. gym.register() in wrong location — Move to config/g129_dex3/__init__.py
  4. Direct class reference for env_cfg_entry_point — Use string format f"{__name__}.module:Class"
  5. "RLinf" in env name — Remove framework-specific branding from gym ID
  6. Missing agents/ directory — Add standard RL agent configs
  7. events missing type annotation — Add events: EventCfg = EventCfg()
  8. num_envs=1 default — Should be 4096 or at least a reasonable batch size
  9. scene field name in scene cfg — Rename to something descriptive
  10. No ground plane — Add explicit ground plane config
  11. mdp/__init__.py should use lazy_export() — Follow standard pattern
  12. Module-level mutable observation cache — Move to per-env instance
  13. Reward side effects (stage machine) — Move update_task_stage to event system
  14. setattr monkey-patching on env instance — Use proper state management
  15. gr00t_config.py imports unavailable package — Move to isaaclab_contrib or guard imports
  16. RLinf YAML with hardcoded paths — Remove from isaaclab_tasks
  17. USD stage access in reward function — Precompute tip offsets during init
  18. No regularization rewards — Add action_rate and joint_vel penalties
  19. Missing num_rerenders_on_reset in env cfg — Set this instead of monkey-patching in RLinf
  20. tray uses ArticulationCfg with empty actuators — Use RigidObjectCfg or AssetBaseCfg

Recommendations (Priority Order)

P0 — Must fix before merge:

  1. Remove dual license headers; use only BSD-3-Clause
  2. Move gr00t_config.py and isaaclab_ppo_gr00t_assemble_trocar.yaml out of isaaclab_tasks (to isaaclab_contrib or a separate package)
  3. Restructure directory to match standard layout: registrations in config/g129_dex3/__init__.py
  4. Fix env_cfg_entry_point to use string format
  5. Remove "RLinf" from gym env IDs
  6. Set num_rerenders_on_reset in the env cfg instead of monkey-patching reset

P1 — Should fix:
7. Add type annotation for events
8. Use lazy_export() in mdp/__init__.py
9. Move stage state machine out of reward functions into events
10. Add ground plane to scene config
11. Fix default num_envs to a reasonable batch size
12. Add action rate / joint velocity regularization rewards

P2 — Nice to have:
13. Create abstract base env cfg with MISSING robot field
14. Move observation cache from module-level to env instance
15. Precompute trocar tip offsets during env initialization
16. Add agents/ directory with at least one standard training config
17. Rename scene field in AssembleTrocarSceneCfg to something descriptive

@kellyguo11 kellyguo11 moved this to In review in Isaac Lab Apr 17, 2026
Copy link
Copy Markdown
Contributor

@kellyguo11 kellyguo11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add some documentation for both RLinf and the environment? the environment can be added to https://isaac-sim.github.io/IsaacLab/develop/source/overview/environments.html and maybe RLinf can be added as a new section under https://isaac-sim.github.io/IsaacLab/develop/source/experimental-features/bleeding-edge.html


import omni.kit.app

_app = omni.kit.app.get_app()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is omni.kit.app needed to be explicitly defined here? could we call env.render() or access other APIs from the env?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kellyguo11 , I have removed this _patch_reset and set the num_rerenders_on_reset, it works well now, thanks.

from collections.abc import Sequence

import isaaclab.sim as sim_utils
from isaaclab.sensors import CameraCfg, TiledCameraCfg
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've deprecated TiledCamera recently, both can just use Camera directly now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated to CameraCfg directly, 🙏

}


G129_CFG_WITH_DEX3_BASE_FIX = ArticulationCfg(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could probably be moved to under isaaclab_assets so that other tasks could also use this robot if they'd like?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved to isaaclab_assets , thanks.


# Get default states (initial positions from config)
# note: default_root_state is the local coordinate relative to the environment origin
tray_default_state = wp.to_torch(tray.data.default_root_state)[env_ids].clone()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wp.to_torch is no longer necessary. we've added a new .torch mechanism to improve performance. the wp.to_torch wrapper will still be supported but is now deprecated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reminder, I have changed wp.to_torch to .torch .

# Precompute and cache column indices
global _body_obs_cache
if _body_obs_cache["device"] != device or _body_obs_cache["idx_t"] is None:
body_joint_indices = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be defined together with the asset instead so that we can keep the mdp implementation clean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have cleaned it, thank you.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR adds a 4-stage trocar assembly task with RLinf integration. The new commits add a _patched_reset function to extension.py that forces a render warmup after reset to fix camera observations. The task implementation is comprehensive but has some structural deviations from Isaac Lab patterns that should be addressed.

Previous Concerns Status

Most structural concerns from the previous review remain unaddressed:

  • ❌ Dual license headers still present in new files
  • gr00t_config.py with external gr00t dependency still in isaaclab_tasks
  • gym.register() still in task root __init__.py instead of config/g129_dex3/
  • ❌ Module-level mutable observation caches still present
  • ❌ Stage machine side effects still in reward functions
  • ✅ The "RLinf" has been removed from gym env IDs (now Isaac-Assemble-Trocar-G129-Dex3-v0)

New Findings

🔴 Critical: extension.py:475-488 — Render warmup patch is fragile and has a subtle bug
The _patched_reset implementation has been simplified but still has issues:

def _patched_reset(*args, **kwargs):
    obs, extras = _original_reset(*args, **kwargs)
    env.sim.set_setting("/app/player/playSimulations", False)
    _app.update()
    env.sim.set_setting("/app/player/playSimulations", True)
    for sensor in env.scene.sensors.values():
        sensor.update(dt=0.0, force_recompute=True)
    obs = env.observation_manager.compute(update_history=True)
    env.obs_buf = obs
    return obs, extras

The env.obs_buf = obs line directly mutates internal state. This should use the env's proper API. Also, ManagerBasedRLEnvCfg has rerender_on_reset: bool and num_rerenders_on_reset: int — consider using those instead of monkey-patching.

🟡 Warning: gr00t_config.py:17-23 — Import will fail for standard Isaac Lab installations

from gr00t.data.dataset import ModalityConfig
from gr00t.data.transform.base import ComposedModalityTransform
# ...

This file is in isaaclab_tasks but imports gr00t which is not a dependency. Anyone importing the task package will get ModuleNotFoundError. Either:

  1. Move to isaaclab_contrib/rl/rlinf/
  2. Guard with try/except ImportError
  3. Make it a lazy import

🔵 Improvement: observations.py:28-37 — Module-level cache causes cross-environment contamination

_body_obs_cache = {
    "device": None,
    "batch": None,
    # ...
}

If two envs with different batch sizes run in the same process, the cache will be corrupted. Store the cache on the env instance instead.

Implementation Verdict

Significant concerns — The task logic is solid, but the structural issues (gr00t dependency in isaaclab_tasks, monkey-patching instead of using env config, module-level caches) need to be resolved before merging.

@kellyguo11 kellyguo11 moved this from In review to In progress in Isaac Lab May 1, 2026
@mingxueg-nv mingxueg-nv force-pushed the Adds_Assemble_Trocar_task_Based_RLinf branch from 3117b97 to b03d761 Compare May 5, 2026 07:12
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

The new commits (b03d761) appear to be identical to the previously reviewed state — no new changes were made to address the previous concerns.

Previous Concerns Status

All previous findings remain unaddressed:

  • gr00t_config.py with external gr00t dependency still in isaaclab_tasks (will cause ModuleNotFoundError)
  • ❌ Module-level mutable observation caches in observations.py (cross-env contamination risk)
  • _patched_reset monkey-patch in extension.py instead of using rerender_on_reset config
  • env.obs_buf = obs direct state mutation

Implementation Verdict

Significant concerns remain — No changes since last review. The critical gr00t import issue in isaaclab_tasks will break standard installations that don't have the gr00t package.

@mingxueg-nv mingxueg-nv requested a review from pascal-roth as a code owner May 6, 2026 10:41
@github-actions github-actions Bot added the asset New asset feature or request label May 6, 2026
@mingxueg-nv mingxueg-nv changed the title Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup for RLinf extension Add assemble_trocar task for G129-Dex3 May 6, 2026
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 7, 2026
@mingxueg-nv
Copy link
Copy Markdown
Contributor Author

Hi, @kellyguo11 , I have updated this PR and resolved comments. Also, we added the environments.rst and bleeding-edge.rst.

Could you please help to review again? Thank you!

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR adds a well-structured multi-stage surgical manipulation task. The 4-stage state machine (lift → align → insert → place) is a sensible decomposition for this assembly problem. However, there are several architectural concerns that should be addressed before merge.

Strengths

  • Clear stage-based reward decomposition with both sparse and dense modes
  • Good use of SceneEntityCfg for parameterizing which assets to track
  • Caching of tip offsets (computed from USD once, then reused)
  • Deterministic rotation variant for reproducible evaluation
  • Camera presets well-organized for GR00T visual input

Concerns

Architecture:

  1. Extensive use of setattr(env, ...) for state — Task stage, reward locks, and debug trackers are all monkey-patched onto the env object. This is fragile, hard to discover, and breaks if two tasks share an env instance or if the env is serialized.
  2. update_task_stage called from every reward function — Each reward term calls update_task_stage with the full parameter set, but only the first call per step actually transitions stages (due to monotonic stage progression). This means the stage thresholds in lift_trocars_reward params silently control ALL stage transitions.
  3. Global mutable caches_body_obs_cache and _dex3_obs_cache are module-level mutable dicts, which will break with multiple env instances or multiprocessing.

Code quality:
4. print() statements in production code — Debug prints should use logging module, not print().
5. Hardcoded USD paths with hardcoded env_0get_trocar_tip_position hardcodes /World/envs/env_0/trocar_1/... for USD stage lookups.
6. No tests — Checklist shows tests unchecked. At minimum, the state machine transitions and reward shaping should have unit tests.

See inline comments for specifics.

torch.Tensor: Current stage for each environment (num_envs,)
"""
if not hasattr(env, "_task_stage"):
env._task_stage = torch.zeros(env.num_envs, dtype=torch.long, device=env.device)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 State monkey-patched onto env: env._task_stage is set via setattr without any formal registration. This pattern is used throughout this file (_prev_stage_lift, _lift_reward_locked, _tip_reward_locked, etc.).

This has several problems:

  1. No type safety or discoverability — other code can't know these attributes exist
  2. If reset_task_stage in events.py misses one of these attrs, state leaks across episodes
  3. If multiple tasks are loaded, attribute names collide

Recommendation: Use a proper state container registered on the env:

@dataclass
class AssembleTrocarState:
    task_stage: torch.Tensor
    prev_stage_lift: torch.Tensor
    lift_reward_locked: torch.Tensor
    # ... etc

# In env setup or first call:
if not hasattr(env, 'assemble_trocar_state'):
    env.assemble_trocar_state = AssembleTrocarState(...)

Or better yet, use Isaac Lab's built-in env.extras dict for task-specific state.

pos2 = obj2.data.root_pos_w.torch
quat1 = obj1.data.root_quat_w.torch
quat2 = obj2.data.root_quat_w.torch
# Store old stage to detect changes (BEFORE any stage transitions)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 update_task_stage called redundantly from every reward function: lift_trocars_reward calls update_task_stage with ALL stage transition thresholds, and it's called again (implicitly via get_task_stage) from the other reward functions. Since stages are monotonically increasing, only the first call per step does real work — but the thresholds passed to lift_trocars_reward silently control all stage transitions.

This coupling is confusing and error-prone. If someone changes insertion_dist_threshold in insert_trocars reward params, it won't actually affect the stage transition because lift_trocars_reward already transitioned the stage with its own (potentially different) threshold.

Recommendation: Call update_task_stage once per step as a separate observation/event term, not from within reward functions. The reward functions should only read get_task_stage(env) and compute their reward.

Returns:
torch.Tensor: Shape (num_envs, 3) - Position in world coordinates
"""
from pxr import Gf, Usd, UsdGeom
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Hardcoded USD paths with env_0: tip_path = "/World/envs/env_0/trocar_1/Trocar002/White_pos" assumes a specific asset hierarchy. If the USD structure changes (e.g., different trocar model), this silently returns torch.zeros(3) with only a print warning.

Also, the from pxr import Gf, Usd, UsdGeom import inside the function body is unusual — it means this function only works when the full Omniverse stack is loaded. Consider:

  1. Moving the import to module level with a TYPE_CHECKING guard
  2. Making the tip prim path configurable (e.g., a parameter in SceneEntityCfg)
  3. Raising an error instead of returning zero offset on missing prim

"device": None,
"batch": None,
"idx_t": None,
"idx_batch": None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Module-level mutable global state: _body_obs_cache and _dex3_obs_cache are module-level mutable dicts that cache device, batch size, and pre-allocated tensors. This breaks in several scenarios:

  1. Multiple env instances with different batch sizes
  2. Device changes (e.g., moving from CPU to GPU)
  3. Multiprocessing (shared state between workers)

The current code checks device and batch on each call, but if two envs with different batch sizes alternate calls, the cache thrashes.

Recommendation: Move the cache into the env instance (using a registered state object), or use functools.lru_cache keyed on (device, batch). Alternatively, for the simple gather pattern here, just use joint_pos[:, G1_29DOF_BODY_JOINT_INDICES] — PyTorch's fancy indexing is already fast and doesn't need manual caching.

pos_buf = _body_obs_cache["pos_buf"]
vel_buf = _body_obs_cache["vel_buf"]
torque_buf = _body_obs_cache["torque_buf"]
combined_buf = _body_obs_cache["combined_buf"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Over-engineered observation extraction: The torch.gather + pre-allocated buffer pattern adds complexity without clear benefit for this use case. PyTorch's native indexing is highly optimized:

def get_robot_body_joint_states(env: ManagerBasedRLEnv) -> torch.Tensor:
    robot_data = env.scene["robot"].data
    idx = G1_29DOF_BODY_JOINT_INDICES
    return torch.cat([
        robot_data.joint_pos.torch[:, idx],
        robot_data.joint_vel.torch[:, idx],
        robot_data.applied_torque.torch[:, idx],
    ], dim=-1)

This is clearer, less error-prone, and likely comparable in performance for 29 joints × 3 quantities.

env: ManagerBasedRLEnv,
env_ids: torch.Tensor,
print_log: bool = False,
) -> None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Fragile state reset: reset_task_stage resets 8+ individual attributes by name. If a new _prev_stage_* or _*_reward_locked attribute is added in rewards.py, this function must be updated in lockstep — there's no enforcement.

This is the natural consequence of the monkey-patching pattern. With a proper state container, reset becomes:

def reset_task_stage(env, env_ids, ...):
    state = env.assemble_trocar_state
    for field in dataclasses.fields(state):
        getattr(state, field.name)[env_ids] = 0

USD_ROOT = f"{HEALTHCARE_S3}/Props/LightWheel"


@configclass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 S3 URL for assets: HEALTHCARE_S3 points to an S3 bucket. Is this publicly accessible? If it requires credentials or may change, consider:

  1. Documenting the access requirements
  2. Using Isaac Lab's asset management system (ISAAC_NUCLEUS_DIR or asset retrieval)
  3. Adding a comment about expected availability/versioning

CameraPresets,
G1RobotPresets,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Module-level joint_names and offset_dict: These 43-element lists pollute the module namespace and are only used by ActionsCfg. Consider moving them inside the class or into the robot config module where they logically belong.

@kellyguo11
Copy link
Copy Markdown
Contributor

Thanks a lot for addressing the feedback! I do feel like the overall implementation still deviates a bit from how other environments are implemented in Isaac Lab. Combined with the bot review, I think perhaps there are a few places we can try to clean up a bit more:

  1. Task state monkey-patched onto the env object env._task_stage, env._prev_stage_lift, env._lift_reward_locked, etc. are created via setattr(env, ...) inside reward functions. The review bot's suggestion — a @dataclass AssembleTrocarState namespaced under env.assemble_trocar_state could work.

  2. Stage machine runs inside a reward function update_task_stage() is called from lift_trocars_reward() and silently controls all four stage transitions using the thresholds passed to that one reward term. The other three reward functions call get_task_stage() and depend on lift_trocars_reward having already run. Stage updates should be an event term run once per step before rewards, not side-effectful reward logic.

  3. Deprecated API usage data.default_root_state in events.py is deprecated.

  4. use logging instead of print()

  5. mdp/__init__.py still uses manual imports + all rather than lazy_export().
    events: EventCfg = EventCfg() was missing a type annotation.

  6. get_trocar_tip_position() hardcodes /World/envs/env_0/... - does this environment work with num_envs > 0? does it always have to be hard-coded to 0? we generally expect our environments to be parallelizable.

  7. tray being defined as an ArticulationCfg, can it actually be a RigidObject instead?

@kellyguo11
Copy link
Copy Markdown
Contributor

@ooctipus what are your thoughts on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

asset New asset feature or request documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants