You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
process_nats_pipeline_result is the dominant task on the antenna celery queue during async_api ML jobs. Profiling on a live production backlog shows it spends most wall-clock time on:
One NATS ack call per task (async_to_sync(ack_task)() around nats-py)
All three are I/O. The task is almost never CPU-bound. That is the workload profile where gevent pays off: a single OS process running a few hundred greenlets can keep the DB/Redis/NATS pipelines saturated, where the same OS process with prefork only has ~one in-flight task per worker process.
Ballpark: prefork at 16 workers ≈ 16 in-flight DB round-trips. Gevent at 200 greenlets could hold 200 in-flight, bounded by pgbouncer pool size and server cores. Could be a 5–10× throughput lever on the ingest path specifically.
What needs verifying before we commit
This is an investigation ticket, not a "just do it" ticket. Switching a production pool class has a long tail of silent-bug risk; we need to confirm each of these:
Monkey-patching at startup. Gevent needs gevent.monkey.patch_all() as the first import in the worker entrypoint — before celery, django, psycopg2, redis-py. Confirm this interacts cleanly with our New Relic agent (newrelic-admin run-program), which wraps the celery entrypoint today.
Postgres driver safety. Django uses psycopg2 (or psycopg3 if we've migrated — verify). Under gevent, psycopg2 blocking calls freeze the greenlet pool unless psycogreen.gevent.patch_psycopg() is called after monkey-patching. Psycopg3's async mode also has specific guidance. Get this right or we'll be slower than prefork.
Pgbouncer pool sizing. At gevent concurrency 100–200, we'll hold that many persistent DB connections if Django's CONN_MAX_AGE is non-zero. Pgbouncer's default_pool_size needs to be large enough, or CONN_MAX_AGE needs to be 0 for the ingest worker specifically. Measure before deploying.
asyncio inside gevent tasks.process_nats_pipeline_result uses async_to_sync(ack_task)() which creates an asyncio event loop per call to run nats-py. asyncio + gevent is supported but the details matter: nats-py uses sockets that will be monkey-patched, which can change reconnect / timeout behaviour in non-obvious ways. Test this in isolation first.
PIL / numpy / torch.create_detection_images runs PIL crops. PIL is blocking C code that releases the GIL but is not explicitly gevent-aware. It works, but long per-image crops would block one greenlet for the full crop duration. For the ingest queue this is probably fine; flagging it for awareness.
Sentry, logging, thread-locals. Sentry's SDK and structlog both have gevent-mode configuration — verify both are set up correctly, or exceptions and logs will get crossed between greenlets.
Add an optional worker service celeryworker-ingest-gevent in docker-compose.worker.yml:
Same image, but entrypoint calls python -c 'from gevent import monkey; monkey.patch_all(); from psycogreen.gevent import patch_psycopg; patch_psycopg(); import celery.__main__; celery.__main__._main()' worker --pool=gevent --concurrency=100 --queues=antenna-ingest
Scale set to 0 by default.
In staging or against a synthesised backlog, scale it to 1 and scale the prefork version to 0. Capture:
Tasks/min throughput vs prefork baseline
p50 / p95 task duration
pgbouncer connection count, postgres active connection count
Memory per process (gevent single process should be flat-ish vs prefork's N processes)
Any new exceptions in Sentry
If throughput is meaningfully better and no new errors surface, decide whether to make gevent the default for ingest, or keep it as an opt-in pool class per deployment.
Why this might be worth the investigation cost
Separate from the pool-class choice, the measurement work above tells us whether the bottleneck is actually "not enough in-flight tasks" (in which case gevent wins) or "pgbouncer / postgres cap" (in which case no amount of celery concurrency helps and we need DB-side scaling instead). Either answer is useful.
Context
process_nats_pipeline_resultis the dominant task on theantennacelery queue duringasync_apiML jobs. Profiling on a live production backlog shows it spends most wall-clock time on:Jobprogress rows)AsyncJobStateManagerstate)async_to_sync(ack_task)()aroundnats-py)All three are I/O. The task is almost never CPU-bound. That is the workload profile where gevent pays off: a single OS process running a few hundred greenlets can keep the DB/Redis/NATS pipelines saturated, where the same OS process with prefork only has ~one in-flight task per worker process.
Ballpark: prefork at 16 workers ≈ 16 in-flight DB round-trips. Gevent at 200 greenlets could hold 200 in-flight, bounded by pgbouncer pool size and server cores. Could be a 5–10× throughput lever on the ingest path specifically.
What needs verifying before we commit
This is an investigation ticket, not a "just do it" ticket. Switching a production pool class has a long tail of silent-bug risk; we need to confirm each of these:
Monkey-patching at startup. Gevent needs
gevent.monkey.patch_all()as the first import in the worker entrypoint — before celery, django, psycopg2, redis-py. Confirm this interacts cleanly with our New Relic agent (newrelic-admin run-program), which wraps the celery entrypoint today.Postgres driver safety. Django uses psycopg2 (or psycopg3 if we've migrated — verify). Under gevent, psycopg2 blocking calls freeze the greenlet pool unless
psycogreen.gevent.patch_psycopg()is called after monkey-patching. Psycopg3's async mode also has specific guidance. Get this right or we'll be slower than prefork.Pgbouncer pool sizing. At gevent concurrency 100–200, we'll hold that many persistent DB connections if Django's
CONN_MAX_AGEis non-zero. Pgbouncer'sdefault_pool_sizeneeds to be large enough, orCONN_MAX_AGEneeds to be 0 for the ingest worker specifically. Measure before deploying.asyncio inside gevent tasks.
process_nats_pipeline_resultusesasync_to_sync(ack_task)()which creates an asyncio event loop per call to runnats-py. asyncio + gevent is supported but the details matter:nats-pyuses sockets that will be monkey-patched, which can change reconnect / timeout behaviour in non-obvious ways. Test this in isolation first.Non-I/O tasks on the same worker. If the ingest worker consumes
antenna-ingestonly (see Split the celeryantennaqueue into workload classes (ingest / housekeeping) #1229 for queue split) then we only need to audit ingest tasks. If not, every task on the shared queue needs gevent safety — an open-ended audit. This ticket depends on Split the celeryantennaqueue into workload classes (ingest / housekeeping) #1229 for that reason.PIL / numpy / torch.
create_detection_imagesruns PIL crops. PIL is blocking C code that releases the GIL but is not explicitly gevent-aware. It works, but long per-image crops would block one greenlet for the full crop duration. For the ingest queue this is probably fine; flagging it for awareness.Sentry, logging, thread-locals. Sentry's SDK and structlog both have gevent-mode configuration — verify both are set up correctly, or exceptions and logs will get crossed between greenlets.
Suggested proof-of-concept
Land Split the celery
antennaqueue into workload classes (ingest / housekeeping) #1229 (queue split) first. Gevent only needs to be viable on the ingest queue, not on the long tail.Add an optional worker service
celeryworker-ingest-geventindocker-compose.worker.yml:python -c 'from gevent import monkey; monkey.patch_all(); from psycogreen.gevent import patch_psycopg; patch_psycopg(); import celery.__main__; celery.__main__._main()' worker --pool=gevent --concurrency=100 --queues=antenna-ingestIn staging or against a synthesised backlog, scale it to 1 and scale the prefork version to 0. Capture:
If throughput is meaningfully better and no new errors surface, decide whether to make gevent the default for ingest, or keep it as an opt-in pool class per deployment.
Why this might be worth the investigation cost
Separate from the pool-class choice, the measurement work above tells us whether the bottleneck is actually "not enough in-flight tasks" (in which case gevent wins) or "pgbouncer / postgres cap" (in which case no amount of celery concurrency helps and we need DB-side scaling instead). Either answer is useful.
Related
antennaqueue into workload classes (ingest / housekeeping) #1229 — queue split. Prerequisite for a clean gevent rollout scoped to ingest only.