Skip to content

saturn-python-axolotl: document [deepspeed] requires the devel base#480

Open
hhuuggoo wants to merge 1 commit into
release-2026.05.01from
hugo/axolotl-deepspeed-needs-devel
Open

saturn-python-axolotl: document [deepspeed] requires the devel base#480
hhuuggoo wants to merge 1 commit into
release-2026.05.01from
hugo/axolotl-deepspeed-needs-devel

Conversation

@hhuuggoo
Copy link
Copy Markdown
Collaborator

@hhuuggoo hhuuggoo commented Jun 7, 2026

Comment-only clarification (no dependency change — the env already has [deepspeed]).

Pins the invariant discovered while getting axolotl training to work end-to-end: deepspeed requires nvcc, which only the -devel base ships. accelerate imports deepspeed unconditionally when installed, so on a runtime base every axolotl job crashes at Trainer init (FileNotFoundError: .../nvcc) — even single-GPU jobs.

The actual fix is in release-images#485 (switch this image's base to saturnbase-python-gpu-devel-12.9). This comment prevents a future revert to the runtime base from silently re-breaking training.

Supersedes #479 (which dropped deepspeed — we keep it for multi-GPU instead).

🤖 Generated with Claude Code

Comment-only. Pins the invariant that deepspeed (kept for multi-GPU training)
needs nvcc, which only the gpu-devel base ships — so the image must build on
saturnbase-python-gpu-devel-12.9 (set in release-images data_science.py,
PR #485). Prevents a future revert to the runtime base from silently re-breaking
every axolotl job at Trainer init.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant