Add assemble_trocar task for G129-Dex3#5082
Add assemble_trocar task for G129-Dex3#5082mingxueg-nv wants to merge 10 commits intoisaac-sim:developfrom
Conversation
Greptile SummaryThis PR adds the Assemble Trocar manipulation task for the Unitree G1 (29-DoF + Dex3) robot, including a 4-stage sparse MDP (lift → tip-align → insert → place), three TiledCamera presets for GR00T visual input, robot articulation config, a GR00T data config, and a minor extension patch that adds a post-reset render-warmup to fix stale camera frames under RTX rendering. Key issues found:
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Episode Start - Stage 0"] -->|"Both trocars lifted above table"| B["Stage 1: Lifted\nlift_trocars_reward fires"]
B -->|"Tip distance < threshold"| C["Stage 2: Tips Aligned\ntrocar_tip_alignment_reward fires"]
C -->|"Parallel AND centers close"| D["Stage 3: Inserted\ntrocar_insertion_reward fires"]
D -->|"Both in target zone AND below z_min"| E["Stage 4: Placed\ntrocar_placement_reward fires"]
E --> F["task_success_termination - Episode Done"]
A -->|"Timeout 20s"| G["time_out termination"]
A -->|"Object z below 0.5m"| H["object_drop_termination"]
B --> H
C --> H
D --> H
F --> I["Reset Events"]
G --> I
H --> I
I --> R1["reset_scene_to_default"]
I --> R2["reset_task_stage - clears caches"]
I --> R3["reset_tray_with_random_rotation"]
R1 --> A
R2 --> A
R3 --> A
|
| stage_just_completed = (env._prev_stage_lift == 0) & (stage >= 1) | ||
| reward = torch.where( | ||
| stage_just_completed, | ||
| torch.ones(env.num_envs, device=env.device) / env.step_dt, | ||
| torch.zeros(env.num_envs, device=env.device), | ||
| ) | ||
|
|
There was a problem hiding this comment.
Sparse reward is 50× larger than documented
The RewardsCfg docstring and inline comments state each stage gives a reward of 1.0 on completion. However, the actual value emitted is 1.0 / env.step_dt. With decimation=4 and sim.dt=1/200, step_dt = 0.02 s, so the actual reward per transition step is 50.0, not 1.0. The same pattern is repeated in trocar_tip_alignment_reward (line 456), trocar_insertion_reward (line 570), and trocar_placement_reward (line 696).
If the intent is to produce a reward equivalent to "50 reward-units delivered over one policy step" (which is a valid design choice for sparse rewards), the comments and RewardsCfg docstring need to be updated to reflect this. If the intent is to give exactly 1.0, the division by step_dt should be removed.
| stage_just_completed = (env._prev_stage_lift == 0) & (stage >= 1) | |
| reward = torch.where( | |
| stage_just_completed, | |
| torch.ones(env.num_envs, device=env.device) / env.step_dt, | |
| torch.zeros(env.num_envs, device=env.device), | |
| ) | |
| reward = torch.where( | |
| stage_just_completed, | |
| torch.ones(env.num_envs, device=env.device), | |
| torch.zeros(env.num_envs, device=env.device), | |
| ) |
| def update_task_stage( | ||
| env: ManagerBasedRLEnv, | ||
| asset_cfg1: SceneEntityCfg, | ||
| asset_cfg2: SceneEntityCfg, | ||
| table_height: float = 0.85483, | ||
| lift_threshold: float = 0.05, | ||
| tip_align_threshold: float = 0.015, | ||
| insertion_dist_threshold: float = 0.03, | ||
| insertion_angle_threshold: float = 0.15, | ||
| placement_x_min: float = -1.8, | ||
| placement_x_max: float = -1.4, | ||
| placement_y_min: float = 1.5, | ||
| placement_y_max: float = 1.8, | ||
| placement_z_min: float = 0.9, | ||
| print_log: bool = False, | ||
| ) -> torch.Tensor: | ||
| """Update task stage based on current state. | ||
|
|
||
| This function checks conditions and advances stages automatically. | ||
| Once a stage is completed, it never goes back. | ||
| """ | ||
| stage = get_task_stage(env) | ||
|
|
||
| obj1: RigidObject = env.scene[asset_cfg1.name] | ||
| obj2: RigidObject = env.scene[asset_cfg2.name] | ||
|
|
||
| pos1 = wp.to_torch(obj1.data.root_pos_w) | ||
| pos2 = wp.to_torch(obj2.data.root_pos_w) | ||
| quat1 = wp.to_torch(obj1.data.root_quat_w) | ||
| quat2 = wp.to_torch(obj2.data.root_quat_w) | ||
|
|
||
| # Store old stage to detect changes (BEFORE any stage transitions) | ||
| old_stage = stage.clone() | ||
|
|
||
| # Stage 0 -> 1: Check if lifted | ||
| target_z = table_height + lift_threshold | ||
| is_lifted_1 = pos1[:, 2] > target_z | ||
| is_lifted_2 = pos2[:, 2] > target_z | ||
| both_lifted = is_lifted_1 & is_lifted_2 | ||
| stage = torch.where((stage == 0) & both_lifted, torch.ones_like(stage), stage) | ||
|
|
||
| # Stage 1 -> 2: Check if tips are aligned (hole found) | ||
| # Get tip positions | ||
| tip_pos1 = get_trocar_tip_position(env, asset_cfg1) | ||
| tip_pos2 = get_trocar_tip_position(env, asset_cfg2) | ||
| tip_dist = torch.norm(tip_pos1 - tip_pos2, dim=-1) | ||
|
|
||
| # Tip alignment success | ||
| tip_aligned = tip_dist < tip_align_threshold | ||
| stage = torch.where((stage == 1) & tip_aligned, torch.full_like(stage, 2), stage) | ||
|
|
||
| # Stage 2 -> 3: Check if inserted (parallel + center close) | ||
| # Get center distance | ||
| center_dist = torch.norm(pos1 - pos2, dim=-1) | ||
|
|
||
| # Check alignment | ||
| target_axis1 = torch.tensor([0.0, 0.0, -1.0], device=env.device).repeat(env.num_envs, 1) | ||
| target_axis2 = torch.tensor([0.0, 0.0, -1.0], device=env.device).repeat(env.num_envs, 1) | ||
| axis1 = quat_apply(quat1, target_axis1) | ||
| axis2 = quat_apply(quat2, target_axis2) | ||
| dot_prod = torch.sum(axis1 * axis2, dim=-1) | ||
| abs_dot = torch.clamp(torch.abs(dot_prod), max=1.0) | ||
| angle = torch.acos(abs_dot) | ||
|
|
||
| # Insertion success: parallel + center close | ||
| is_parallel = angle < insertion_angle_threshold | ||
| center_close = center_dist < insertion_dist_threshold | ||
| is_inserted = is_parallel & center_close | ||
|
|
||
| stage = torch.where((stage == 2) & is_inserted, torch.full_like(stage, 3), stage) | ||
|
|
||
| # Stage 3 -> 4: Check if placed in target zone | ||
| # Get environment origins to handle multi-env spatial offsets | ||
| env_origins = env.scene.env_origins # shape: (num_envs, 3) | ||
|
|
||
| # Adjust target zone relative to each environment's origin | ||
| curr_x_min = env_origins[:, 0] + min(placement_x_min, placement_x_max) # (num_envs,) | ||
| curr_x_max = env_origins[:, 0] + max(placement_x_min, placement_x_max) | ||
| curr_y_min = env_origins[:, 1] + min(placement_y_min, placement_y_max) | ||
| curr_y_max = env_origins[:, 1] + max(placement_y_min, placement_y_max) | ||
|
|
||
| in_zone_1 = ( | ||
| (pos1[:, 0] >= curr_x_min) | ||
| & (pos1[:, 0] <= curr_x_max) | ||
| & (pos1[:, 1] >= curr_y_min) | ||
| & (pos1[:, 1] <= curr_y_max) | ||
| & (pos1[:, 2] < placement_z_min) | ||
| ) | ||
| in_zone_2 = ( | ||
| (pos2[:, 0] >= curr_x_min) | ||
| & (pos2[:, 0] <= curr_x_max) | ||
| & (pos2[:, 1] >= curr_y_min) | ||
| & (pos2[:, 1] <= curr_y_max) | ||
| & (pos2[:, 2] < placement_z_min) | ||
| ) | ||
| both_in_zone = in_zone_1 & in_zone_2 | ||
| stage = torch.where((stage == 3) & both_in_zone, torch.full_like(stage, 4), stage) | ||
|
|
||
| # Print stage transitions (AFTER all stage transitions - always print when stage changes) | ||
| if print_log and (stage != old_stage).any(): | ||
| for env_id in range(env.num_envs): | ||
| if stage[env_id] != old_stage[env_id]: | ||
| print(f"Env {env_id}: Stage {old_stage[env_id].item()} → {stage[env_id].item()}") | ||
|
|
||
| env._task_stage = stage | ||
| return stage |
There was a problem hiding this comment.
Stage advancement depends on implicit reward evaluation order
update_task_stage (which advances env._task_stage) is only called from lift_trocars_reward. The three subsequent reward functions — trocar_tip_alignment_reward, trocar_insertion_reward, and trocar_placement_reward — call get_task_stage, which simply reads env._task_stage. This means the stage read by the later rewards is only up-to-date if lift_trocars_reward is evaluated first in every step.
Isaac Lab evaluates reward terms in the order they appear in RewardsCfg, which today happens to match (lift_trocars is declared first). However, this ordering is not enforced by the API, and any reordering of the dict (or framework change) will cause the later rewards to silently use a one-step-stale stage value, producing spurious or missed transition rewards.
Consider factoring update_task_stage into an independent event term (evaluated once per step before rewards) or at least adding a docstring/assertion that guards against incorrect evaluation order.
| _body_obs_cache = { | ||
| "device": None, | ||
| "batch": None, | ||
| "idx_t": None, | ||
| "idx_batch": None, | ||
| "pos_buf": None, | ||
| "vel_buf": None, | ||
| "torque_buf": None, | ||
| "combined_buf": None, | ||
| } |
There was a problem hiding this comment.
Module-level observation caches will interfere across multiple environment instances
_body_obs_cache and _dex3_obs_cache are module-level globals keyed only on (device, batch_size). If more than one ManagerBasedRLEnv instance is created in the same process (e.g. training env + eval env, or two parallel environments with different batch sizes), the second instantiation will either evict the first env's cached index tensors and buffers (causing silent wrong results) or — worse — two environments sharing the same batch size will share a single pre-allocated output buffer, causing them to overwrite each other's results.
Consider scoping these caches to the environment instance (e.g. as attributes on env) rather than as module-level globals.
| trocar_2 = RigidObjectCfg( | ||
| prim_path="/World/envs/env_.*/trocar_2", | ||
| spawn=UsdFileCfg( | ||
| usd_path=( | ||
| f"{USD_ROOT}/LightWheel/Assets/" | ||
| "DisposableLaparoscopicPunctureDevice001/" | ||
| "DisposableLaparoscopicPunctureDevice005-xform.usd" | ||
| ), | ||
| rigid_props=sim_utils.RigidBodyPropertiesCfg( | ||
| rigid_body_enabled=True, | ||
| disable_gravity=False, | ||
| ), | ||
| ), | ||
| init_state=RigidObjectCfg.InitialStateCfg( | ||
| rot=[-0.71475, -0.000243, 0.05853, 0.69692], pos=[-1.50635, 1.90997, 0.8631] | ||
| ), | ||
| ) |
There was a problem hiding this comment.
trocar_2 is missing explicit collision_props
trocar_1 has collision_props explicitly set (collision_enabled=True, contact_offset=0.001, rest_offset=-0.001) but trocar_2 does not. Since both trocars need to interact physically during the insertion stage, inconsistent collision settings could produce unreliable contact detection. Consider adding equivalent collision_props to trocar_2's UsdFileCfg:
spawn=UsdFileCfg(
usd_path=(...),
rigid_props=sim_utils.RigidBodyPropertiesCfg(...),
collision_props=sim_utils.CollisionPropertiesCfg(
collision_enabled=True,
contact_offset=0.001,
rest_offset=-0.001,
),
),|
Hi @kellyguo11 , could you please help review this PR? Thanks in advance. |
🤖 Isaac Lab Review BotSummary: The trocar assembly task design is solid (4-stage sparse reward with clean stage progression), but there are real bugs and architectural issues that need addressing before merge. Request Changes — 10 findings, 1 runtime crash bug. Findings (ranked by severity)
CI Status: No CI checks reported for this PR. |
| def _patched_reset(*args, **kwargs): | ||
| obs, extras = _original_reset(*args, **kwargs) | ||
| try: | ||
| import isaaclab_physx.renderers.isaac_rtx_renderer_utils as _rtx_utils |
There was a problem hiding this comment.
why is this patch necessary? this is also restricting the renderer to RTX, is it not possible to use newton renderer as well?
There was a problem hiding this comment.
Good question — I agree this is worth scrutinizing.
Looking at the core ManagerBasedEnv.reset() (manager_based_env.py L378-380), it already handles the rerender warmup natively:
if self.sim.has_rtx_sensors() and self.cfg.num_rerenders_on_reset > 0:
for _ in range(self.cfg.num_rerenders_on_reset):
self.sim.render()So the framework already re-renders on reset when RTX sensors are present and num_rerenders_on_reset > 0 — and it does so through self.sim.render(), which is renderer-agnostic.
This monkey-patch duplicates that logic but goes directly through RTX-specific internals (_rtx_utils._last_render_update_key), which:
- Restricts to RTX — as you noted, Newton renderer users would get no warmup (only the
except ImportError: passpath) - Accesses private internals —
_last_render_update_keyis an implementation detail that could change - Bypasses the framework's own render path —
self.sim.render()already handles the render-update-key reset internally
If the built-in rerender-on-reset isn't sufficient for the RLinf use case (e.g., the env.reset() call from gym.make().unwrapped isn't triggering the warmup correctly), it would be better to investigate why and fix it at the framework level rather than patching around it here. @mingxueg-nv — is there a specific failure mode you're seeing that the existing num_rerenders_on_reset doesn't handle?
There was a problem hiding this comment.
Hi, @kellyguo11 , during training and evaluation, the first frame of recorded videos shows the robot in its USD default pose instead of the environment's reset pose, the camera observation returned by env.reset() does not get the actual scene state.
I have already tried to set num_rerenders_on_reset, but it doesn't work:
-
In IsaacLab 2.3, sim.render() called
self.forward() + _app.update()to flush data and trigger the full render pipeline. -
In IsaacLab 3.0, this was removed, sim.render() now only updates visualizers. But ManagerBasedEnv.reset() still relies on
sim.render()fornum_rerenders_on_reset, making it a no-op.
I have removed the limit of render to RTX in patch, thanks.
There was a problem hiding this comment.
Thanks for the detailed explanation and the code references, @mingxueg-nv — that makes sense now.
I traced through the code and can confirm the issue:
ManagerBasedEnv.reset()(L378-379) callssim.render()inside thenum_rerenders_on_resetloop.- In headless mode,
sim.render()hits theNO_RENDERINGbranch (L613-619) which throttles updates and skipsforward()— so the scene data never actually gets flushed to the renderer after reset. - This means the first camera observation still captures the USD default pose rather than the post-reset state.
Your updated patch looks good — using set_setting("/app/player/playSimulations", False) + _app.update() is the same pattern that sim.render() uses internally for full render mode, and the explicit sensor.update(dt=0.0, force_recompute=True) loop ensures cameras pick up the new state. This is now renderer-agnostic too, which addresses my earlier concern. 👍
One minor thought: this feels like it could eventually be a fix upstream in ManagerBasedEnv.reset() itself (e.g., using forward() + _app.update() instead of bare sim.render() when render_mode="rgb_array"), so that all camera-based envs benefit. But as a workaround in the RLinf extension, this is clean and well-scoped.
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # Licensed under the Apache License, Version 2.0 (the "License"); |
There was a problem hiding this comment.
any reason why we also need the apache license?
There was a problem hiding this comment.
Good catch. The rest of the Isaac Lab codebase uses only the BSD-3-Clause header (e.g., lift/config/__init__.py, stack/config/__init__.py, etc.). This file — and several other new files in this PR (rewards.py, observations.py, etc.) — have a dual header with both BSD-3-Clause and Apache-2.0, plus the full Apache license text block.
This looks like it may have been carried over from an internal NVIDIA template. Since this code is being contributed into the Isaac Lab repo, the standard BSD-3-Clause header should be sufficient. The Apache-2.0 block (lines 6-19) can likely be removed from all the new files to stay consistent with the rest of the project.
There was a problem hiding this comment.
@kellyguo11 , thanks for reminder, I have removed all duplicated and apache headers from all files.
There was a problem hiding this comment.
Confirmed — checked the latest commit and the headers look clean now. __init__.py, camera_config.py, etc. all have just the standard BSD-3-Clause block. Thanks for the quick fix! ✅
| # | ||
| # SPDX-License-Identifier: BSD-3-Clause | ||
|
|
||
| # Copyright (c) 2022-2026, The Isaac Lab Project Developers. |
There was a problem hiding this comment.
duplicated license headers
There was a problem hiding this comment.
Have removed, thanks
There was a problem hiding this comment.
Looks good, confirmed the duplicate is gone. Thanks! 👍
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Environment Design Consistency Review
Summary
This PR adds an assemble_trocar manipulation task for the Unitree G129 + Dex3 robot along with an RLinf rendering warmup fix. The task implements a multi-stage surgical trocar assembly: lift → tip-align → insert → place.
Overall assessment: The environment is functional and well-implemented for the task logic, but has significant deviations from established Isaac Lab environment patterns that need to be addressed before merging into isaaclab_tasks. Below is a comprehensive comparison against reference environments (lift, stack, reach, pick_place).
1. Directory Structure
Reference pattern (lift/stack/reach):
task_name/
├── __init__.py ← empty or minimal, no registrations
├── task_env_cfg.py ← base/abstract env cfg with MISSING fields
├── mdp/
│ ├── __init__.py ← uses lazy_export()
│ ├── observations.py
│ ├── rewards.py
│ └── terminations.py
└── config/
├── __init__.py ← empty
└── robot_name/
├── __init__.py ← gym.register() calls here
├── agents/ ← RL agent configs (rsl_rl, skrl, sb3, etc.)
└── joint_pos_env_cfg.py ← concrete env cfg
New env structure:
assemble_trocar/
├── __init__.py ← ❌ gym.register() here (should be in config/g129_dex3/)
├── g129_dex3_env_cfg.py ← ❌ concrete cfg in task root (should be under config/g129_dex3/)
├── mdp/
│ ├── __init__.py ← ❌ manual imports (should use lazy_export())
│ ├── events.py ← ✅ (lift doesn't have this, but it's fine)
│ ├── observations.py ← ✅
│ ├── rewards.py ← ✅
│ └── terminations.py ← ✅
└── config/
├── __init__.py ← ❌ exports robot/camera presets (not registrations)
├── camera_config.py ← ❌ should be in task config, not generic config
├── gr00t_config.py ← ❌ GR00T-specific, has hard external dependency
├── isaaclab_ppo_gr00t_assemble_trocar.yaml ← ❌ RLinf training config doesn't belong here
└── robot_config.py ← ❌ should use isaaclab_assets or be task-level config
Key deviations:
gym.register()should be inconfig/g129_dex3/__init__.py, not the task root__init__.py- The concrete env cfg should be under
config/g129_dex3/, not the task root - The task root
__init__.pyshould be empty/minimal (like lift/stack) - Missing the base/abstract env cfg pattern (lift has
LiftEnvCfgwithMISSINGfields) - No
agents/directory with standard RL training configs
2. Environment Registration
Reference pattern (Isaac-Lift-Cube-Franka-v0):
# In config/franka/__init__.py
gym.register(
id="Isaac-Lift-Cube-Franka-v0",
entry_point="isaaclab.envs:ManagerBasedRLEnv",
kwargs={
"env_cfg_entry_point": f"{__name__}.joint_pos_env_cfg:FrankaCubeLiftEnvCfg", # STRING reference
"rsl_rl_cfg_entry_point": ..., # agent configs
"skrl_cfg_entry_point": ...,
},
disable_env_checker=True,
)New env:
# In __init__.py (wrong location!)
gym.register(
id="Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0",
kwargs={
"env_cfg_entry_point": g129_dex3_env_cfg.G1AssembleTrocarEnvCfg, # DIRECT CLASS reference
},
)Issues:
- Registration location: Should be in
config/g129_dex3/__init__.py, not the task-level__init__.py - Entry point format: Uses direct class reference instead of string
f"{__name__}.module:ClassName". While this works, it's inconsistent with every other env in the repo - Naming convention:
Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0includes "RLinf" — task names should be framework-agnostic (e.g.,Isaac-Assemble-Trocar-G129-Dex3-v0) - Missing agent configs: No
rsl_rl_cfg_entry_point,skrl_cfg_entry_point, etc. — even if only RLinf is used initially, the structure should support standard Isaac Lab training workflows
3. License Headers
Reference pattern (all Isaac Lab files):
# Copyright (c) 2022-2026, The Isaac Lab Project Developers (...)
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-ClauseNew env files have DUAL license headers:
# Copyright (c) 2022-2026, The Isaac Lab Project Developers (...) # ← Isaac Lab header
# SPDX-License-Identifier: BSD-3-Clause
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION # ← Apache-2.0 header
# SPDX-License-Identifier: Apache-2.0
# Licensed under the Apache License, Version 2.0 (the "License")...This creates a license conflict. Files in isaaclab_tasks should use only the BSD-3-Clause header. The Apache-2.0 header should be removed from all files.
4. Scene Configuration
Reference pattern (lift ObjectTableSceneCfg):
@configclass
class ObjectTableSceneCfg(InteractiveSceneCfg):
robot: ArticulationCfg = MISSING # ← abstract, set by concrete cfg
ee_frame: FrameTransformerCfg = MISSING # ← abstract
object: RigidObjectCfg = MISSING # ← abstract
table = AssetBaseCfg(...) # ← concrete
plane = AssetBaseCfg(prim_path="/World/GroundPlane", ...) # ← ground plane
light = AssetBaseCfg(prim_path="/World/light", ...)New env AssembleTrocarSceneCfg:
@configclass
class AssembleTrocarSceneCfg(InteractiveSceneCfg):
robot: ArticulationCfg = G1RobotPresets.g1_29dof_dex3_base_fix(...) # ← concrete, not abstract
scene = AssetBaseCfg(...) # ← "scene" is a very generic name
trocar_1 = RigidObjectCfg(...)
trocar_2 = RigidObjectCfg(...)
tray = ArticulationCfg(...)
light = AssetBaseCfg(...) # ← no ground plane!Issues:
- No ground plane — All reference envs define a ground plane (
/World/GroundPlane). The new env relies on the scene USD having one, but this should be explicit - No abstract base — Robot is hardcoded instead of using
MISSINGpattern for robot-agnostic task definition - Field naming —
sceneis a poor name for a scene entity (shadowing theInteractiveSceneCfgconcept); should be something descriptive likeoperating_tableorworkspace - Camera configs as class attributes —
front_camera,left_wrist_camera,right_wrist_cameraare set via classmethod calls rather than declarative config instances trayusesArticulationCfgwith empty actuators — This is a passive prop;RigidObjectCfgwould be more semantically correct (orAssetBaseCfgif truly passive)
5. Environment Configuration
Reference pattern (lift LiftEnvCfg):
@configclass
class LiftEnvCfg(ManagerBasedRLEnvCfg):
scene: ObjectTableSceneCfg = ObjectTableSceneCfg(num_envs=4096, env_spacing=2.5)
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
commands: CommandsCfg = CommandsCfg()
rewards: RewardsCfg = RewardsCfg()
terminations: TerminationsCfg = TerminationsCfg()
events: EventCfg = EventCfg() # ← type-annotated
curriculum: CurriculumCfg = CurriculumCfg()New env:
@configclass
class G1AssembleTrocarEnvCfg(ManagerBasedRLEnvCfg):
scene: AssembleTrocarSceneCfg = AssembleTrocarSceneCfg(num_envs=1, ...)
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
terminations: TerminationsCfg = TerminationsCfg()
events = EventCfg() # ← missing type annotation!
commands = None # ← None instead of omitting
rewards: RewardsCfg = RewardsCfg()
curriculum = None # ← None instead of omittingIssues:
events = EventCfg()missing type annotation — Should beevents: EventCfg = EventCfg()num_envs=1— Reference envs default to 4096. Setting 1 as default seems wrong for batch trainingcommands = Noneandcurriculum = None— These are alreadyNoneby default inManagerBasedRLEnvCfg, so explicitly setting them is unnecessary and clutters the configepisode_length_s = 20.0— This is set in__post_init__butMISSINGin the base class, so it must be set. This is fine, but notably longer than lift (5.0s)- Render settings in
__post_init__—self.sim.render.enable_translucency,self.sim.render.carb_settings,rendering_mode = "quality",antialiasing_mode = "DLAA"— these are unusual and task-specific. No other reference env sets these. Consider whether these belong in the env config or should be runtime overrides
6. MDP Module (mdp/)
Reference pattern (lazy_export()):
# mdp/__init__.py
from isaaclab.utils.module import lazy_export
lazy_export()New env:
# mdp/__init__.py
from isaaclab.envs.mdp import JointPositionActionCfg, time_out
from .events import ...
from .observations import ...
from .rewards import ...
from .terminations import ...
__all__ = [...]Issues:
- Should use
lazy_export()— This is the standard pattern in Isaac Lab mdp modules. It handles star imports and avoids manual__all__maintenance - Re-exports from
isaaclab.envs.mdp—JointPositionActionCfgandtime_outshould be imported directly where used, not re-exported through the task's mdp package
7. Observations
Reference pattern (lift):
joint_pos = ObsTerm(func=mdp.joint_pos_rel)
joint_vel = ObsTerm(func=mdp.joint_vel_rel)
object_position = ObsTerm(func=mdp.object_position_in_robot_root_frame)New env:
robot_joint_state = ObsTerm(func=mdp.get_robot_body_joint_states) # 87 dims (29*3)
robot_dex3_joint_state = ObsTerm(func=mdp.get_robot_dex3_joint_states) # 14 dimsIssues:
- Hardcoded joint indices — The observation functions use magic numbers for joint reordering (
body_joint_indices = [0, 3, 6, 9, ...]). This is fragile and couples the observation to a specific USD joint order. Consider usingjoint_names_exprpatterns instead - Global mutable state —
_body_obs_cacheand_dex3_obs_cacheare module-level mutable dicts. This breaks if multiple envs with different configurations run in the same process. The cache should be per-environment instance - No object observations — The env doesn't observe trocar positions/orientations in the policy group, only joint states. This means the policy can only use camera images for object state — valid for visuomotor policies, but unusual for the
policyobservation group concatenate_terms = False— This is intentional for visuomotor policies but worth documenting explicitly
8. Reward Functions
Reference pattern (lift):
reaching_object = RewTerm(func=mdp.object_ee_distance, params={"std": 0.1}, weight=1.0)
lifting_object = RewTerm(func=mdp.object_is_lifted, params={"minimal_height": 0.04}, weight=15.0)
action_rate = RewTerm(func=mdp.action_rate_l2, weight=-1e-4)New env (rewards.py — 736 lines!):
- Complex multi-stage state machine (
_task_stage) stored on env instance - Reward locking mechanism with
_lift_reward_locked,_tip_reward_locked, etc. - Debug printing with
should_print_debug() update_task_stage()called inside first reward function (lift_trocars_reward)
Issues:
- Reward functions have side effects —
lift_trocars_reward()callsupdate_task_stage()which modifiesenv._task_stage. The reward manager doesn't guarantee call order, so this is fragile. Stage updates should be in events or a dedicated pre-step hook, not buried in the first reward function - Storing state on env via
setattr—env._task_stage,env._prev_stage_*,env._lift_reward_locked, etc. are monkey-patched onto the env instance. This is the wrong pattern — Isaac Lab provides the observation/state manager for this, or custom attributes should be in the env cfg/env class reward / env.step_dtscaling — In sparse reward mode, rewards are divided byenv.step_dt. This is unusual and couples the reward magnitude to the physics timestep- No regularization rewards — Reference envs include
action_rate_l2andjoint_vel_l2penalties. This env has none, which may cause jerky motions - USD stage access in reward functions —
get_trocar_tip_position()accesses the USD stage viapxrAPI to compute tip offsets. This is extremely unusual for reward functions and may have performance implications. The offset should be precomputed in scene setup or the env__init__
9. External Dependencies
config/gr00t_config.py imports:
from gr00t.data.dataset import ModalityConfig
from gr00t.data.transform.base import ComposedModalityTransform
...
from gr00t.experiment.data_config import DATA_CONFIG_MAP, BaseDataConfig
from gr00t.model.transforms import GR00TTransformIssues:
gr00tis NOT a dependency ofisaaclab_tasks— This file will cause import errors for anyone installing Isaac Lab normally. It should either:- Be moved to
isaaclab_contrib(where RLinf lives) - Be guarded with try/except ImportError
- Not be in the task directory at all
- Be moved to
- The YAML config file (
isaaclab_ppo_gr00t_assemble_trocar.yaml) is RLinf-specific training configuration with hardcoded checkpoint paths (/mnt/ckpt/...). This doesn't belong inisaaclab_tasks
10. RLinf Extension Changes (rendering warmup)
The monkey-patching approach:
_original_reset = env.reset
def _patched_reset(*args, **kwargs):
obs, extras = _original_reset(*args, **kwargs)
# force render warmup
for _ in range(_n_warmup):
_rtx_utils._last_render_update_key = (0, -1) # ← private API access
env.sim.set_setting("/app/player/playSimulations", False)
_app.update()
env.sim.set_setting("/app/player/playSimulations", True)
env.scene.update(dt=0)
obs = env.observation_manager.compute(update_history=True)
env.obs_buf = obs
return obs, extras
env.reset = _patched_resetIssues:
- Accessing private API —
_rtx_utils._last_render_update_key = (0, -1)reaches into a private attribute ofisaac_rtx_renderer_utils. This will break when the renderer internals change env.obs_buf = obsoverride — Directly setting the observation buffer bypasses any safety checks or hooks- The right approach — Isaac Lab already has
num_rerenders_on_resetinManagerBasedEnvCfg. If this doesn't work correctly for the RLinf use case, the fix should be in the coreManagerBasedEnv.reset()method, not monkey-patched from the extension. File an issue or fix it upstream - Why not just set
env.cfg.num_rerenders_on_reset? — The env config already supports this. If it's not working correctly, that's a bug to fix in the env, not to patch around
Specific Issues (Numbered)
- Dual license headers — All new files have both BSD-3-Clause and Apache-2.0 headers. Remove Apache-2.0 headers
__init__.pyhas duplicate copyright block — Lines 1-4 and 6-8 are both copyright noticesgym.register()in wrong location — Move toconfig/g129_dex3/__init__.py- Direct class reference for
env_cfg_entry_point— Use string formatf"{__name__}.module:Class" - "RLinf" in env name — Remove framework-specific branding from gym ID
- Missing
agents/directory — Add standard RL agent configs eventsmissing type annotation — Addevents: EventCfg = EventCfg()num_envs=1default — Should be 4096 or at least a reasonable batch sizescenefield name in scene cfg — Rename to something descriptive- No ground plane — Add explicit ground plane config
mdp/__init__.pyshould uselazy_export()— Follow standard pattern- Module-level mutable observation cache — Move to per-env instance
- Reward side effects (stage machine) — Move
update_task_stageto event system setattrmonkey-patching on env instance — Use proper state managementgr00t_config.pyimports unavailable package — Move toisaaclab_contribor guard imports- RLinf YAML with hardcoded paths — Remove from
isaaclab_tasks - USD stage access in reward function — Precompute tip offsets during init
- No regularization rewards — Add action_rate and joint_vel penalties
- Missing
num_rerenders_on_resetin env cfg — Set this instead of monkey-patching in RLinf trayusesArticulationCfgwith empty actuators — UseRigidObjectCfgorAssetBaseCfg
Recommendations (Priority Order)
P0 — Must fix before merge:
- Remove dual license headers; use only BSD-3-Clause
- Move
gr00t_config.pyandisaaclab_ppo_gr00t_assemble_trocar.yamlout ofisaaclab_tasks(toisaaclab_contribor a separate package) - Restructure directory to match standard layout: registrations in
config/g129_dex3/__init__.py - Fix
env_cfg_entry_pointto use string format - Remove "RLinf" from gym env IDs
- Set
num_rerenders_on_resetin the env cfg instead of monkey-patching reset
P1 — Should fix:
7. Add type annotation for events
8. Use lazy_export() in mdp/__init__.py
9. Move stage state machine out of reward functions into events
10. Add ground plane to scene config
11. Fix default num_envs to a reasonable batch size
12. Add action rate / joint velocity regularization rewards
P2 — Nice to have:
13. Create abstract base env cfg with MISSING robot field
14. Move observation cache from module-level to env instance
15. Precompute trocar tip offsets during env initialization
16. Add agents/ directory with at least one standard training config
17. Rename scene field in AssembleTrocarSceneCfg to something descriptive
kellyguo11
left a comment
There was a problem hiding this comment.
can you also add some documentation for both RLinf and the environment? the environment can be added to https://isaac-sim.github.io/IsaacLab/develop/source/overview/environments.html and maybe RLinf can be added as a new section under https://isaac-sim.github.io/IsaacLab/develop/source/experimental-features/bleeding-edge.html
|
|
||
| import omni.kit.app | ||
|
|
||
| _app = omni.kit.app.get_app() |
There was a problem hiding this comment.
why is omni.kit.app needed to be explicitly defined here? could we call env.render() or access other APIs from the env?
There was a problem hiding this comment.
Hi, @kellyguo11 , I have removed this _patch_reset and set the num_rerenders_on_reset, it works well now, thanks.
| from collections.abc import Sequence | ||
|
|
||
| import isaaclab.sim as sim_utils | ||
| from isaaclab.sensors import CameraCfg, TiledCameraCfg |
There was a problem hiding this comment.
we've deprecated TiledCamera recently, both can just use Camera directly now
There was a problem hiding this comment.
Have updated to CameraCfg directly, 🙏
| } | ||
|
|
||
|
|
||
| G129_CFG_WITH_DEX3_BASE_FIX = ArticulationCfg( |
There was a problem hiding this comment.
I think this could probably be moved to under isaaclab_assets so that other tasks could also use this robot if they'd like?
There was a problem hiding this comment.
I have moved to isaaclab_assets , thanks.
|
|
||
| # Get default states (initial positions from config) | ||
| # note: default_root_state is the local coordinate relative to the environment origin | ||
| tray_default_state = wp.to_torch(tray.data.default_root_state)[env_ids].clone() |
There was a problem hiding this comment.
wp.to_torch is no longer necessary. we've added a new .torch mechanism to improve performance. the wp.to_torch wrapper will still be supported but is now deprecated
There was a problem hiding this comment.
Thank you for reminder, I have changed wp.to_torch to .torch .
| # Precompute and cache column indices | ||
| global _body_obs_cache | ||
| if _body_obs_cache["device"] != device or _body_obs_cache["idx_t"] is None: | ||
| body_joint_indices = [ |
There was a problem hiding this comment.
could this be defined together with the asset instead so that we can keep the mdp implementation clean?
There was a problem hiding this comment.
Have cleaned it, thank you.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR adds a 4-stage trocar assembly task with RLinf integration. The new commits add a _patched_reset function to extension.py that forces a render warmup after reset to fix camera observations. The task implementation is comprehensive but has some structural deviations from Isaac Lab patterns that should be addressed.
Previous Concerns Status
Most structural concerns from the previous review remain unaddressed:
- ❌ Dual license headers still present in new files
- ❌
gr00t_config.pywith externalgr00tdependency still inisaaclab_tasks - ❌
gym.register()still in task root__init__.pyinstead ofconfig/g129_dex3/ - ❌ Module-level mutable observation caches still present
- ❌ Stage machine side effects still in reward functions
- ✅ The "RLinf" has been removed from gym env IDs (now
Isaac-Assemble-Trocar-G129-Dex3-v0)
New Findings
🔴 Critical: extension.py:475-488 — Render warmup patch is fragile and has a subtle bug
The _patched_reset implementation has been simplified but still has issues:
def _patched_reset(*args, **kwargs):
obs, extras = _original_reset(*args, **kwargs)
env.sim.set_setting("/app/player/playSimulations", False)
_app.update()
env.sim.set_setting("/app/player/playSimulations", True)
for sensor in env.scene.sensors.values():
sensor.update(dt=0.0, force_recompute=True)
obs = env.observation_manager.compute(update_history=True)
env.obs_buf = obs
return obs, extrasThe env.obs_buf = obs line directly mutates internal state. This should use the env's proper API. Also, ManagerBasedRLEnvCfg has rerender_on_reset: bool and num_rerenders_on_reset: int — consider using those instead of monkey-patching.
🟡 Warning: gr00t_config.py:17-23 — Import will fail for standard Isaac Lab installations
from gr00t.data.dataset import ModalityConfig
from gr00t.data.transform.base import ComposedModalityTransform
# ...This file is in isaaclab_tasks but imports gr00t which is not a dependency. Anyone importing the task package will get ModuleNotFoundError. Either:
- Move to
isaaclab_contrib/rl/rlinf/ - Guard with
try/except ImportError - Make it a lazy import
🔵 Improvement: observations.py:28-37 — Module-level cache causes cross-environment contamination
_body_obs_cache = {
"device": None,
"batch": None,
# ...
}If two envs with different batch sizes run in the same process, the cache will be corrupted. Store the cache on the env instance instead.
Implementation Verdict
Significant concerns — The task logic is solid, but the structural issues (gr00t dependency in isaaclab_tasks, monkey-patching instead of using env config, module-level caches) need to be resolved before merging.
… in rlinf extension Made-with: Cursor
3117b97 to
b03d761
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Follow-up Review
The new commits (b03d761) appear to be identical to the previously reviewed state — no new changes were made to address the previous concerns.
Previous Concerns Status
All previous findings remain unaddressed:
- ❌
gr00t_config.pywith externalgr00tdependency still inisaaclab_tasks(will causeModuleNotFoundError) - ❌ Module-level mutable observation caches in
observations.py(cross-env contamination risk) - ❌
_patched_resetmonkey-patch inextension.pyinstead of usingrerender_on_resetconfig - ❌
env.obs_buf = obsdirect state mutation
Implementation Verdict
Significant concerns remain — No changes since last review. The critical gr00t import issue in isaaclab_tasks will break standard installations that don't have the gr00t package.
…hange to Cameracfg, and restructure task registration
|
Hi, @kellyguo11 , I have updated this PR and resolved comments. Also, we added the environments.rst and bleeding-edge.rst. Could you please help to review again? Thank you! |
There was a problem hiding this comment.
Review Summary
This PR adds a well-structured multi-stage surgical manipulation task. The 4-stage state machine (lift → align → insert → place) is a sensible decomposition for this assembly problem. However, there are several architectural concerns that should be addressed before merge.
Strengths
- Clear stage-based reward decomposition with both sparse and dense modes
- Good use of
SceneEntityCfgfor parameterizing which assets to track - Caching of tip offsets (computed from USD once, then reused)
- Deterministic rotation variant for reproducible evaluation
- Camera presets well-organized for GR00T visual input
Concerns
Architecture:
- Extensive use of
setattr(env, ...)for state — Task stage, reward locks, and debug trackers are all monkey-patched onto the env object. This is fragile, hard to discover, and breaks if two tasks share an env instance or if the env is serialized. update_task_stagecalled from every reward function — Each reward term callsupdate_task_stagewith the full parameter set, but only the first call per step actually transitions stages (due to monotonic stage progression). This means the stage thresholds inlift_trocars_rewardparams silently control ALL stage transitions.- Global mutable caches —
_body_obs_cacheand_dex3_obs_cacheare module-level mutable dicts, which will break with multiple env instances or multiprocessing.
Code quality:
4. print() statements in production code — Debug prints should use logging module, not print().
5. Hardcoded USD paths with hardcoded env_0 — get_trocar_tip_position hardcodes /World/envs/env_0/trocar_1/... for USD stage lookups.
6. No tests — Checklist shows tests unchecked. At minimum, the state machine transitions and reward shaping should have unit tests.
See inline comments for specifics.
| torch.Tensor: Current stage for each environment (num_envs,) | ||
| """ | ||
| if not hasattr(env, "_task_stage"): | ||
| env._task_stage = torch.zeros(env.num_envs, dtype=torch.long, device=env.device) |
There was a problem hiding this comment.
🔴 State monkey-patched onto env: env._task_stage is set via setattr without any formal registration. This pattern is used throughout this file (_prev_stage_lift, _lift_reward_locked, _tip_reward_locked, etc.).
This has several problems:
- No type safety or discoverability — other code can't know these attributes exist
- If
reset_task_stagein events.py misses one of these attrs, state leaks across episodes - If multiple tasks are loaded, attribute names collide
Recommendation: Use a proper state container registered on the env:
@dataclass
class AssembleTrocarState:
task_stage: torch.Tensor
prev_stage_lift: torch.Tensor
lift_reward_locked: torch.Tensor
# ... etc
# In env setup or first call:
if not hasattr(env, 'assemble_trocar_state'):
env.assemble_trocar_state = AssembleTrocarState(...)Or better yet, use Isaac Lab's built-in env.extras dict for task-specific state.
| pos2 = obj2.data.root_pos_w.torch | ||
| quat1 = obj1.data.root_quat_w.torch | ||
| quat2 = obj2.data.root_quat_w.torch | ||
| # Store old stage to detect changes (BEFORE any stage transitions) |
There was a problem hiding this comment.
🔴 update_task_stage called redundantly from every reward function: lift_trocars_reward calls update_task_stage with ALL stage transition thresholds, and it's called again (implicitly via get_task_stage) from the other reward functions. Since stages are monotonically increasing, only the first call per step does real work — but the thresholds passed to lift_trocars_reward silently control all stage transitions.
This coupling is confusing and error-prone. If someone changes insertion_dist_threshold in insert_trocars reward params, it won't actually affect the stage transition because lift_trocars_reward already transitioned the stage with its own (potentially different) threshold.
Recommendation: Call update_task_stage once per step as a separate observation/event term, not from within reward functions. The reward functions should only read get_task_stage(env) and compute their reward.
| Returns: | ||
| torch.Tensor: Shape (num_envs, 3) - Position in world coordinates | ||
| """ | ||
| from pxr import Gf, Usd, UsdGeom |
There was a problem hiding this comment.
🟡 Hardcoded USD paths with env_0: tip_path = "/World/envs/env_0/trocar_1/Trocar002/White_pos" assumes a specific asset hierarchy. If the USD structure changes (e.g., different trocar model), this silently returns torch.zeros(3) with only a print warning.
Also, the from pxr import Gf, Usd, UsdGeom import inside the function body is unusual — it means this function only works when the full Omniverse stack is loaded. Consider:
- Moving the import to module level with a
TYPE_CHECKINGguard - Making the tip prim path configurable (e.g., a parameter in
SceneEntityCfg) - Raising an error instead of returning zero offset on missing prim
| "device": None, | ||
| "batch": None, | ||
| "idx_t": None, | ||
| "idx_batch": None, |
There was a problem hiding this comment.
🔴 Module-level mutable global state: _body_obs_cache and _dex3_obs_cache are module-level mutable dicts that cache device, batch size, and pre-allocated tensors. This breaks in several scenarios:
- Multiple env instances with different batch sizes
- Device changes (e.g., moving from CPU to GPU)
- Multiprocessing (shared state between workers)
The current code checks device and batch on each call, but if two envs with different batch sizes alternate calls, the cache thrashes.
Recommendation: Move the cache into the env instance (using a registered state object), or use functools.lru_cache keyed on (device, batch). Alternatively, for the simple gather pattern here, just use joint_pos[:, G1_29DOF_BODY_JOINT_INDICES] — PyTorch's fancy indexing is already fast and doesn't need manual caching.
| pos_buf = _body_obs_cache["pos_buf"] | ||
| vel_buf = _body_obs_cache["vel_buf"] | ||
| torque_buf = _body_obs_cache["torque_buf"] | ||
| combined_buf = _body_obs_cache["combined_buf"] |
There was a problem hiding this comment.
💡 Over-engineered observation extraction: The torch.gather + pre-allocated buffer pattern adds complexity without clear benefit for this use case. PyTorch's native indexing is highly optimized:
def get_robot_body_joint_states(env: ManagerBasedRLEnv) -> torch.Tensor:
robot_data = env.scene["robot"].data
idx = G1_29DOF_BODY_JOINT_INDICES
return torch.cat([
robot_data.joint_pos.torch[:, idx],
robot_data.joint_vel.torch[:, idx],
robot_data.applied_torque.torch[:, idx],
], dim=-1)This is clearer, less error-prone, and likely comparable in performance for 29 joints × 3 quantities.
| env: ManagerBasedRLEnv, | ||
| env_ids: torch.Tensor, | ||
| print_log: bool = False, | ||
| ) -> None: |
There was a problem hiding this comment.
🟡 Fragile state reset: reset_task_stage resets 8+ individual attributes by name. If a new _prev_stage_* or _*_reward_locked attribute is added in rewards.py, this function must be updated in lockstep — there's no enforcement.
This is the natural consequence of the monkey-patching pattern. With a proper state container, reset becomes:
def reset_task_stage(env, env_ids, ...):
state = env.assemble_trocar_state
for field in dataclasses.fields(state):
getattr(state, field.name)[env_ids] = 0| USD_ROOT = f"{HEALTHCARE_S3}/Props/LightWheel" | ||
|
|
||
|
|
||
| @configclass |
There was a problem hiding this comment.
💡 S3 URL for assets: HEALTHCARE_S3 points to an S3 bucket. Is this publicly accessible? If it requires credentials or may change, consider:
- Documenting the access requirements
- Using Isaac Lab's asset management system (
ISAAC_NUCLEUS_DIRor asset retrieval) - Adding a comment about expected availability/versioning
| CameraPresets, | ||
| G1RobotPresets, | ||
| ) | ||
|
|
There was a problem hiding this comment.
🟡 Module-level joint_names and offset_dict: These 43-element lists pollute the module namespace and are only used by ActionsCfg. Consider moving them inside the class or into the robot config module where they logically belong.
|
Thanks a lot for addressing the feedback! I do feel like the overall implementation still deviates a bit from how other environments are implemented in Isaac Lab. Combined with the bot review, I think perhaps there are a few places we can try to clean up a bit more:
|
|
@ooctipus what are your thoughts on this PR? |
Description
Adds the Assemble Trocar manipulation task for the Unitree G1 (29-DoF + Dex3), with RLinf support.
Key additions:
IsaacLabDataConfigdefining video/state/action modality keys, transforms (SinCos state encoding, min-max action normalization, color jitter), and model-specific settings.isaaclab_contrib/rl/rlinf/extension.pyto support the new task registration.Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there