Skip to content

Make DataLoader tuning knobs configurable via env (#138)#139

Draft
mihow wants to merge 1 commit into
feature/uv-migrationfrom
feature/tuning-knobs-138
Draft

Make DataLoader tuning knobs configurable via env (#138)#139
mihow wants to merge 1 commit into
feature/uv-migrationfrom
feature/tuning-knobs-138

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented Apr 10, 2026

Summary

Implements Direction 2 from issue #138 — exposes the previously-hardcoded DataLoader and worker loop values as pydantic Settings fields, with current values as defaults so there is no behavior change on upgrade.

Also documents `AMI_ANTENNA_API_BATCH_SIZE` (the existing but undocumented knob that appears to be the biggest lever on peak ingest memory) in `.env.example` and the README, and adds a request-and-memory flow diagram to the `datasets.py` module docstring.

This PR is deliberately no behavior change by default. Every new setting defaults to its current hardcoded value. Operators who want to tune just set env vars.

New settings

Env var Default Previously hardcoded at
`AMI_ANTENNA_API_DATALOADER_PIN_MEMORY` `true` `datasets.py::get_rest_dataloader`
`AMI_ANTENNA_API_DATALOADER_PREFETCH_FACTOR` `4` `datasets.py::get_rest_dataloader`
`AMI_ANTENNA_API_DATALOADER_DOWNLOAD_THREADS` `8` `datasets.py::RESTDataset._ensure_sessions`
`AMI_ANTENNA_MAX_PENDING_POSTS` `5` `worker.py::MAX_PENDING_POSTS`
`AMI_ANTENNA_IDLE_SLEEP_SECONDS` `5` `worker.py::SLEEP_TIME_SECONDS`

The existing module-level `MAX_PENDING_POSTS` and `SLEEP_TIME_SECONDS` constants in `worker.py` are kept as-is for backward compatibility with any external imports; the runtime code now reads from settings.

Documentation changes

  • `.env.example`: adds commented-out entries for every new knob with tradeoff notes. Adds a memory-scaling formula and rough per-worker-RAM guidance for `AMI_ANTENNA_API_BATCH_SIZE`. Calls out explicitly that `AMI_LOCALIZATION_BATCH_SIZE` / `AMI_CLASSIFICATION_BATCH_SIZE` do not limit ingest memory — a common confusion.
  • `README.md`: adds a "Tuning for memory-constrained or large-image workloads" section under "Running the Antenna Worker" with the peak-RAM formula, per-RAM-size starting points, and a deployment note about the supervisor/systemd soft FD limit (the default 1024 is too low for the worker's real FD working set and causes a confusing `OSError: [Errno 24]` at startup).
  • `datasets.py` module docstring: adds an ASCII flow diagram showing the per-subprocess request → download → tensor → prefetch queue path and where each env var applies. Also corrects a couple of stale values in the existing quick-reference (default `antenna_api_batch_size` is 24, not 16; `prefetch_factor` is explicitly set to 4, not "PyTorch default").

What's NOT in this PR

Intentionally scoped small:

Testing

  • `AMI_ANTENNA_API_DATALOADER_PIN_MEMORY=false`, `AMI_ANTENNA_API_DATALOADER_PREFETCH_FACTOR=1`, and `AMI_ANTENNA_API_BATCH_SIZE=4` were applied as a hot patch on a test worker processing a job with ~914 source images at ~24 MB per JPEG. Worker ran stably at ~13 GB RSS on a 68 GB instance, ~2 s/image, across multiple consecutive jobs. Without any of those changes the same worker OOM-killed at ~68 GB in ~40 seconds.
  • Pre-commit hooks pass (black, isort, flake8, autoflake, trailing whitespace, EOF).
  • Not yet done: isolated ablation — the one deployment had all three changes at once, so we haven't proven that `AMI_ANTENNA_API_BATCH_SIZE=4` alone is sufficient. See the "what we still need to verify" section of Clarify and document tuning parameters #138. Once this PR lands, that test is just setting one env var and becomes trivial.

Base branch

This PR targets `feature/uv-migration` because that's the branch currently used in our deployment. Once `feature/uv-migration` merges to main, GitHub will auto-retarget this PR.

Closes none of #138 (this is one direction of several — the issue stays open).

Follow-up TODOs (before merge or as separate issues)

  • Isolated ablation to prove the config-only fix is sufficient. With this PR deployed, set AMI_ANTENNA_API_BATCH_SIZE=8 only (leaving AMI_ANTENNA_API_DATALOADER_PIN_MEMORY at its default true and PREFETCH_FACTOR at 4) on a small-RAM worker, and run a large-image job. If it completes without OOM, the minimal fix is just the one env var and we can recommend that as the universal guidance. If it OOMs, we know pin_memory=false is actually load-bearing and should probably become the new default for HTTP-sourced workloads (separate PR). Cheap to run: one env edit and a worker restart.
  • Small-image regression test. Run a typical ~1.5 MB JPEG job with AMI_ANTENNA_API_DATALOADER_PIN_MEMORY=false and compare throughput against the current defaults. If throughput is within noise, pin_memory=false is probably the right new default for this worker.
  • Memray / tracemalloc profile of a large-image job to replace the estimated numbers in the tables with measured ones. Would also catch any residual leak on top of the steady-state peak.
  • Once feature/uv-migration merges to main, re-target this PR.

Expose the previously-hardcoded values in get_rest_dataloader() and
worker.py as pydantic Settings fields, with current values as defaults
so there is no behavior change on upgrade. Adds:

  AMI_ANTENNA_API_DATALOADER_PIN_MEMORY        (default: true)
  AMI_ANTENNA_API_DATALOADER_PREFETCH_FACTOR   (default: 4)
  AMI_ANTENNA_API_DATALOADER_DOWNLOAD_THREADS  (default: 8)
  AMI_ANTENNA_MAX_PENDING_POSTS                (default: 5)
  AMI_ANTENNA_IDLE_SLEEP_SECONDS               (default: 5)

Also:
- Document AMI_ANTENNA_API_BATCH_SIZE (the biggest lever on peak ingest
  RAM) in .env.example with the memory-scaling formula and guidance for
  picking a value per worker size.
- Add a "Tuning for memory-constrained or large-image workloads" section
  to the README, plus a deployment note about raising the supervisor /
  systemd soft FD limit to 65536.
- Add a request-and-memory flow diagram to the datasets.py module
  docstring so operators and future maintainers can see where peak RAM
  actually comes from.
- Correct stale values in the existing datasets.py docstring (default
  batch size is 24, not 16; prefetch_factor is 4, not PyTorch default).

This is Direction 2 from the discussion in #138 — a no-behavior-change
refactor that makes the tuning knobs reachable without source edits and
sets up the work in Direction 3 (a tuning command) and Direction 4
(probe-based per-job auto-tune).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7223179c-3f52-4df0-98a7-7005d0e6b90c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/tuning-knobs-138

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant