This repository implements a Proximal Policy Optimization (PPO) agent to play Atari Breakout using PyTorch and Gymnasium. It includes training, evaluation, and video rendering utilities.
- PPO Algorithm: Stable policy gradient method (
ppo.py). - Actor-Critic Model: CNN-based policy/value network (
actor_critic.py). - Frame Preprocessing: Cropping, resizing, grayscale conversion.
- Training Script: Train agent and save checkpoints (
main.py,main.ipynb). - Evaluation Script: Run trained agent interactively (
test.py). - Video Rendering: Generate crisp, squashed, slow-motion videos (
make_best_video.py). - Best Model: Pretrained weights at
models/best_model.pth. - Best Run Video: Example gameplay at
videos/best_video.mp4.
- Python 3.10+
- PyTorch
- Gymnasium
- ALE-py
- OpenCV
- imageio
Install dependencies:
pip install torch gymnasium[atari] ale-py opencv-python imageioTo train the agent:
python main.pyor use main.ipynb for interactive training.
Checkpoints are saved in models/.
To run the trained agent:
python test.pyThis loads models/best_model.pth and plays Breakout.
To render a video of the best run:
python make_best_video.pyOutput is saved to videos/ (see videos/best_video.mp4).
ppo.py: PPO algorithm implementation (PPO).actor_critic.py: CNN actor-critic model (ActorCritic).main.py: Training loop.main.ipynb: Notebook for training/experimentation.test.py: Evaluation script.make_best_video.py: Video rendering utility.models/: Saved model checkpoints.videos/: Rendered gameplay videos.
- Best Model:
models/best_model.pth - Best Video:
videos/best_video.mp4
MIT License
References:
