STeCa

This repository contains the code for the paper "STeCa: Step-level Trajectory Calibration for LLM Agent Learning"

✨ Key Advantages:

🎯 Real-time Calibration: Identifies and corrects suboptimal behaviors through step-level reward comparison during exploration
🔄 Automated Trajectory Enhancement: Leverages LLM-driven reflection to automatically construct calibrated trajectories without human intervention
🚀 Robust Task Completion: Combines calibrated and successful trajectories for reinforced training, significantly improving agent performance on complex long-horizon tasks (ALFWorld and VirtualHome)

🎉News

[2025.04.15] 🔥 Our paper is accepted in ACL25 Findings.
[2025.02.19] 🚀 STeCa Repo launched!

⚙️ Setup

# Create virtual environment for agentic evaluation
conda create -n STeCa python=3.9
conda activate STeCa
pip install -r requirements.txt

# Download data for ALFWorld environment
cd eval_agent/data/alfworld
gdown https://drive.google.com/uc?id=1y7Vqeo0_xm9d3I07vZaP6qbPFtyuJ6kI
unzip alfworld_data.zip

# Download data for VirtualHome environment
cd ../..
gdown https://drive.google.com/uc?id=1kZKWkWhtJ-DneqfS1Nb_FybR1RBxPeef
unzip virtualhome_master.zip

# Download expert trajectories for ALFWorld and VirtualHome environment
gdown https://drive.google.com/uc?id=1tqZyqzyE7NnJdUJlwwfcwnPUFMafKdfA
unzip data.zip

⛏️ Usage

🤖 SFT Training

bash sft/my_sft.sh

🗺️ Data Construction

Firstly, please run following scripts to conduct acquisite step rewards:

# For ALFWorld environment

# Collect step rewards for expert step
bash sampling/alfworld/expert_generate_response.sh
bash sampling/alfworld/expert_mc_explore.sh

# Collect step rewards for explored step
bash sampling/alfworld/explore_generate_response.sh
bash sampling/alfworld/explore_mc_explore.sh

# For VirtualHome environment

# Collect step rewards for expert step
bash sampling/virtualhome/expert_generate_response.sh
bash sampling/virtualhome/expert_mc_explore.sh

# Collect step rewards for explored step
bash sampling/virtualhome/explore_generate_response.sh
bash sampling/virtualhome/explore_mc_explore.sh

Then, run the following script to construct the calibration dataset, explored successful trajectory datase and the expert sub-trajectory dataset.

# For ALFWorld environment
# Collecting expert sub-trajectory dataset
python data_collection/data_expert_alfworld.py
# Collecting calibration trajectory
python data_collection/data_cali_alfworld.py
# Collecting explored successful trajectory dataset
python data_collection/data_explored_alfworld.py

# For VirtualHome environment
# Collecting expert sub-trajectory dataset
python data_collection/data_org_vh.py
# Collecting calibration trajectory
python data_collection/data_cali_vh.py
# Collecting explored successful trajectory dataset
python data_collection/data_explored_vh.py

💪 Reinforced Training

bash Reinfoced_train/steca_train.sh

📊 Evaluation

# ALFWorld
bash eval/my_eval_alfworld.sh
# VirtualHome
bash eval/my_eval_vh.sh

🙏 Acknowledgments

This codebase is built from ETO and IPR.

📖 Citation

If you find this repo helpful, please cite our paper:

@article{wang2025steca,
  title={Steca: Step-level trajectory calibration for llm agent learning},
  author={Wang, Hanlin and Wang, Jian and Leong, Chak Tou and Li, Wenjie},
  journal={arXiv preprint arXiv:2502.14276},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Reinforced_train		Reinforced_train
assets		assets
data_collection		data_collection
eval		eval
eval_agent		eval_agent
fastchat		fastchat
sampling		sampling
sft		sft
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

STeCa

✨ Key Advantages:

🎉News

📝Contents

⚙️ Setup

⛏️ Usage

🤖 SFT Training

🗺️ Data Construction

💪 Reinforced Training

📊 Evaluation

🙏 Acknowledgments

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

WangHanLinHenry/STeCa

Folders and files

Latest commit

History

Repository files navigation

STeCa

✨ Key Advantages:

🎉News

📝Contents

⚙️ Setup

⛏️ Usage

🤖 SFT Training

🗺️ Data Construction

💪 Reinforced Training

📊 Evaluation

🙏 Acknowledgments

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages