feat(env): support android_world env per-turn training#30
Merged
lwaekfjlk merged 4 commits intoopen-tinker:mainfrom Feb 20, 2026
Merged
Conversation
…nfigurations Summary of changes: - Completed android_world_multiturn.md with setup, usage, and reward structure. - Updated android_world_param.yaml to set default env_shards to 1. - Added ADB and port environment variables to android_world_game.py. - Adjusted launch_scheduler.sh for local environment paths and GPU configuration. - Implemented prompt truncation in generic_agent_loop.py to prevent tensor size mismatch (replacing submodule modification).
…ining Core Android World environment: - AndroidWorldGame with multi-emulator shard support - AndroidWorldServer with emulator-to-worker binding - Multimodal VL prompt templates (INITIAL/ACTION split) - gym_environment_interaction with worker-to-endpoint binding Android agent loop: - AndroidAgentLoop (1190 lines) with multimodal VL support - Per-turn training mode with expansion_index - <obs> separator based generation context optimization - Agent registry entry in agent.yaml Per-turn training system: - PerTurnAgentLoopManager expanding multi-turn episodes into per-turn samples - per_turn_agent_loop.py with expansion logic and reward gamma discounting - Backend patches: ray_trainer.py (PerTurnAgentLoopManager import), rollout.py (per_turn_training/per_turn_reward_gamma config) - http_training_server.py expansion of batch tensors via expansion_index Infrastructure improvements: - base_game.py/base_game_environment.py: agent_loop_name class attribute - job_scheduler.py: ROLLOUT_TRACE_JOB_ID env var + KL divergence forwarding - generic_agent_loop.py: per-job trace subdirectory isolation, JSONL fix - actor.yaml: ppo_max_token_len_per_gpu 32768 Scripts and config: - run_android.sh with env var configuration (no hardcoded paths) - launch_scheduler.sh with generic trace dir - launch_http_server.py with generic checkpoint dir - scheduler.yaml and android_world_param.yaml cleaned
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📑 Description
Added support for training in android_world environment. Currently, training is conducted in a per-turn fashion due to the context window limit.
ℹ Additional Information
Android emulator may require hardware support or some privileged commands on your device. Make sure they are available.