Python port of Ultimate Tapan kaikki, with Gymnasium/PPO tooling for AI training and visible model playback.
- This project is based on the original repository:
https://github.com/hkroger/ultimatetapankaikki. - All credits for the original game, assets, and core design belong to the original author(s) and contributors.
- This repository follows GPL-3.0 licensing terms (see
LICENSE).
- This is not a 100% finished port, differs many ways, for example enemy AI behaviour.
- Playable runtime is available (headless, terminal, pygame).
- AI modules are implemented for Gymnasium environment usage, PPO training/evaluation, and saved-model pygame playback.
- For transparency, development phase plans and progress notes are kept in-repo under
docs/notes/andpython_refactor.md.
This project exposes the game as a Gymnasium environment for reinforcement learning experiments.
- Environment wrapper lives under
src/ultimatetk/ai/. reset()starts a fresh headless gameplay episode (default flow starts from level 1) and returns the first observation.step(action)applies AI controls to the same core gameplay simulation used by the normal game runtime.- Supported control dimensions include movement, turning, strafing, shooting, and weapon selection.
- The core spatial observation is a full
360°scan split into32equal angular segments around the player. - Each segment encodes nearest directional context (for example obstacle/enemy/projectile presence and distance-style signals).
- Segment features are combined with player/runtime telemetry into a compact PPO-friendly state vector.
- Reward is shaped for learning-oriented behavior (survival, combat effectiveness, progression momentum).
- Reward shaping is still evolving and should be treated as an experiment surface, not a finalized benchmark setup.
- Episodes terminate on player death (
death), successful run completion (game_completed), or configured step/time limits.
- This repository does not ship guaranteed "win-the-game" hyperparameters.
- There are no official one-click training presets that reliably solve the game out of the box.
- The AI stack is a sandbox for experimentation, iteration, and learning-oriented RL workflows.
- PPO tools (
tools/ppo_train.py,tools/ppo_eval.py,tools/ppo_play_pygame.py) are provided as practical baselines.
- macOS/Linux/Windows with Conda installed (
minicondaoranaconda) - Python 3.12 (default target)
Creates or updates an environment and installs editable project dependencies (dev, pygame):
./scripts/setup_conda_env.shCustom environment name:
./scripts/setup_conda_env.sh my-env-nameActivate:
conda activate ultimatetkconda create -y -n ultimatetk python=3.12 pip
conda activate ultimatetk
python -m pip install --upgrade pip
python -m pip install -e "."conda create -y -n ultimatetk python=3.12 pip
conda activate ultimatetk
python -m pip install --upgrade pip
python -m pip install -e ".[dev,pygame]"Start from Option C (or script setup), then install AI dependencies in the same env:
conda install -y -n ultimatetk -c conda-forge numpy gymnasium pytorch stable-baselines3 tensorboard "setuptools<81"Optional editable extras (in active env):
python -m pip install -e ".[ai]"
python -m pip install -e ".[ai_train]"All commands assume repository root.
PYTHONPATH=src python3 -m ultimatetk --max-seconds 2 --autostart-gameplay --status-print-interval 40PYTHONPATH=src python3 -m ultimatetk --max-seconds 1.2 --autostart-gameplay --status-print-interval 20 --input-script "5:+MOVE_FORWARD;25:-MOVE_FORWARD;30:+TURN_LEFT;36:-TURN_LEFT"PYTHONPATH=src python3 -m ultimatetk --platform terminal --autostart-gameplay --status-print-interval 20PYTHONPATH=src python3 -m ultimatetk --platform pygame --autostart-gameplay --window-scale 3Window scale examples:
--window-scale 2->640x400--window-scale 3->960x600
python3 tools/gym_random_policy_smoke.py --episodes 1 --max-steps 300python3 tools/ppo_train.pyUses default training settings from tools/ppo_train.py.
Baseline command with explicit common defaults:
python3 tools/ppo_train.py \
--total-timesteps 5000000 \
--n-envs 1 \
--device auto \
--seed 123 \
--n-steps 4096 \
--batch-size 512 \
--gamma 0.99 \
--gae-lambda 0.95 \
--clip-range 0.2 \
--learning-rate-start 0.0003 \
--learning-rate 0.00005 \
--decay-ratio 0.8 \
--ent-coef-start 0.05 \
--ent-coef 0.01 \
--max-episode-steps 6000 \
--target-tick-rate 40 \
--randomize-level-on-reset \
--level-index-pool 0,1,2,3,4,5,6,7,8,9 \
--checkpoint-freq 1000000 \
--eval-freq 25000 \
--eval-episodes 5Example:
python3 tools/ppo_train.py --device auto --total-timesteps 30000000 --batch-size 512Common flags and defaults:
--total-timesteps 5000000--n-envs 1--device auto--seed 123--n-steps 4096--batch-size 512--gamma 0.99--gae-lambda 0.95--clip-range 0.2--learning-rate-start 0.0003--learning-rate 0.00005--decay-ratio 0.8--ent-coef-start 0.05--ent-coef 0.01--max-episode-steps 6000--target-tick-rate 40--randomize-level-on-reset(off by default)--level-index-pool 0,1,2,3,4,5,6,7,8,9--checkpoint-freq 1000000--eval-freq 25000--eval-episodes 5
Note:
- Run management flags:
--run-name,--runs-root,--resume-from,--disable-asset-manifest-check,--render-training-scenes - Scenario flag:
--weapon-mode - Level randomization flags:
--randomize-level-on-reset,--level-index-pool
Level randomization behavior:
- With
--randomize-level-on-reset, each training episode reset samples a start level from--level-index-pool. --level-index-poolaccepts comma-separated non-negative level indices (duplicates are ignored).- Evaluation callback env stays fixed at level index
0for stable metric comparison across runs.
Weapon mode choices (--weapon-mode for both ppo_train.py and ppo_eval.py):
normal_mode(default): keeps the normal weapon/ammo system, crates enabled, and standard level behaviorfistpistolashotgunuziauto_riflegrenade_launcherauto_grenadierheavy_launcherauto_shotgunc4_activatorflame_throwermine_dropper
Behavior in non-normal_mode weapon modes:
- Selected weapon is forced/equipped from episode start
- Player ammo is true infinite (no ammo consumption)
- Crates are disabled/removed
python3 tools/ppo_train.py --eval-freq 0 --checkpoint-freq 0python3 tools/ppo_train.py --resume-from runs/ai/ppo/<run>/checkpoints/ppo_model_50000_steps.zippython3 tools/ppo_eval.py --model runs/ai/ppo/<run>/final_model.zip --episodes 5 --device autopython3 tools/ppo_play_pygame.py --model runs/ai/ppo/<run>/final_model.zip --target-fps 40 --window-scale 3 --device auto --weapon-mode normal_modeUseful playback flags:
--max-seconds 30limit playback wall time--max-steps 2000limit simulation steps--weapon-mode auto_riflematch playback scenario to training/eval mode--allow-manual-inputmix keyboard input with AI actions for debugging--stochasticenable sampling mode (default playback/eval is deterministic)
Training writes logs under runs/ai/ppo/<run>/tensorboard.
tensorboard --logdir runs/ai/ppo/<run>/tensorboard --host 127.0.0.1 --port 6006Open: http://127.0.0.1:6006/
- Apple Silicon:
--device autodefaults to CPU for throughput; use--device mpsexplicitly if needed. - CUDA hosts:
--device autoprefers CUDA when available; use--device cudato force. - CPU fallback:
--device cpu.
- Main menu:
W/SorA/Dselect,Space/Enter/Tabconfirm - Movement/turn:
WASDor arrow keys - Strafe:
Q/E - Shoot:
Space - Next weapon:
Tab(pygame also supports mouse wheel +PageUp/PageDown) - Toggle shop:
RorEnter - Shop controls:
W/Srows,A/Dcolumns,Spacebuy,Tabsell - Direct weapon slot:
`,1..0,-(pygame also supports numpad0..9andF1..F12) - Quit:
Esc
Default release verification:
python3 tools/release_verification.pyStrict legacy parity against archived root:
python3 tools/release_verification.py --legacy-compare-root /path/to/original/legacy-root- Runtime assets:
game_data/ - Runtime outputs and artifacts:
runs/ - Phase notes:
docs/notes/ - Refactor roadmap/progress log:
python_refactor.md
Regenerate asset manifest and gap report:
python3 tools/asset_manifest_report.pyCopy archived legacy assets into game_data/:
python3 tools/migrate_legacy_data.py --legacy-root /path/to/original/legacy-rootProbe format loaders:
python3 tools/format_probe.pyRender probe screenshot:
python3 tools/render_probe.py --output runs/screenshots/phase3_render_probe.ppmTimo Heimonen timo.heimonen@proton.me
