-
-
Notifications
You must be signed in to change notification settings - Fork 339
2026
💻 Zoom link: https://us06web.zoom.us/j/89601195963
📆 Meeting calendar invite. (First Thursday of each month, 10:00 a.m. central time)
Note
🎥 Please note that by joining and participating in these Working Group meetings, you acknowledge that your name will be visible to other attendees in the Zoom session, and this participation will be considered a public record. Furthermore, your verbal or written contributions may be included in the publicly accessible meeting notes and summary.
Please provide time estimates for each agenda item.
**Agenda items must be added at least 48 hours before the meeting, and they should be deleted or moved to the next meeting no later than 10 hours before the scheduled start time. **
- Facilitator/time-keeper: Gerd Heber
- Note-taker/Editor: AI/Lori
- 👉 Reminder: Please note the change to a monthly cadence. First Thursday of each month.
- Discussion covering RFC: String-Based Filter Configuration API (Scot)
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •
- Facilitator/time-keeper: Gerd Heber
- Note-taker/Editor: AI/Lori
- Debugging infinite loops /
H5close(Elena, 2 min) - Review threadsafe locking protocol (Quincey, 10 min)
- Discuss ideas for reviewing threadsafe branch(es) (Quincey, 20 min)
- Strategy, phases, boundaries, go/no-go criteria (Gerd, 20 min)
None of this is new, but before we get lost in technical details, can we restate a few things I believe (mistakenly?) we agree on?
- Multiple application threads can call HDF5 safely
- Multiple calls can make progress simultaneously inside HDF5
- One HDF5 call can fan out into internal worker threads
I think we agree we want all three eventually, but we need to agree on how we get there.
Is there agreement on these elements?
- Architectural first move: make the VOL boundary and its support packages concurrency-ready enough to move the serialization boundary down.
- Immediate implementation move: use the native VOL as the first consumer of that work and the first place to ship narrow wins.
- Do not try to finish either side in isolation.
- Treat internal native-VOL fan-out as a tactical accelerator, not the base model.
What's wrong with the phases outlined below? A lot, probably, but we need something like this before proceeding.
Committing resources would be madness otherwise.
-
H5TSonly, API entry/exit macro path, build/test infrastructure, and docs. -
Lock-rank annotations, contention telemetry, and a single component taxonomy: safe, serialized, unsupported
- Add rank discipline to the existing
H5TSsubstrate - Do not attempt a fine-grained locking redesign (yet)
- Add rank discipline to the existing
-
Exit criteria: a published concurrency matrix (see here), a mandatory fallback story for every unsupported stack, sanitizer-backed stress tests, and lifecycle/shutdown rules that are tighter than the current “join all HDF5-using threads before
H5close” warning
-
H5EandH5CX - The lock boundary does not move yet; the goal is to make these packages concurrency-clean under the existing top lock.
-
Exit criteria: concurrent error-stack tests, nested callback re-entry tests, white-box
H5CXpush/pop tests, and zero races with no dependence on lower native/VOL code.
-
H5IandH5P - Identifier lifetime/refcount rules
- Property-list sharing rules, and what does “read-only sharing” versus “concurrent mutation” mean
- Resolve
H5CX's use of non-default DXPL/LAPL objects directly, does not copy them, and can pass those references down to callbacks - Exit criteria: documented object-sharing semantics, stress-tested create/copy/close/lookup races, and no refcount/ABA failures under sanitizers and fault injection
H5VL- lock boundary moves from “above VOL” to “around unsafe connector stacks”
- VOL dispatch safe, but admit connector stacks only by capability: safe stacks run concurrently, unsafe stacks get a serialization wrapper, and unsupported stacks are rejected
- coarse-grained flags, not per-API flags
- Exit criteria: native VOL plus at least one pass-through stack working, deterministic fallback for unsafe stacks, and no core locks held across user callbacks into connectors
- Start with
sec2andcore; everything else begins life as serialized or unsupported - Lock boundary should become per-file/VFD-instance for open/close/flush state, with raw positioned I/O happening outside file-metadata locks.
- Exit criteria: same-file and different-file read/read correctness on the supported VFDs, plus a clean safe/serialized/unsupported story for plugin-loaded VFDs
- Scope is narrow around the native open/read/close path for
H5F/H5O/H5G/H5A/H5D, with only theH5S/H5Tsupport needed for simple hyperslabs and fixed-size/no-conversion reads. - The lock boundary should be: metadata/object traversal and chunk discovery under a file/metadata lock, then raw I/O, filter application, and memory copies outside that lock, operating only on pre-materialized work units and private buffers.
- Keep the metadata cache serialized and bypass the chunk cache/page buffer on this path initially.
- Exit criteria: supported different-file and same-file read/read concurrency with automatic fallback to the serial path for anything outside that envelope
- Expand
H5SandH5T: point/irregular selections, datatype conversion/xform, and only then selected reference/vlen cases. - selection-iteration and conversion state must be per-operation rather than shared global state, and user conversion/filter code must never run under file/cache locks. - Exit criteria: point-selection and conversion stress tests, an explicit allow-list for thread-safe filters/plugins, and no unacceptable single-thread regressions; everything not on the allow-list still falls back to the serial path
- Requires significant re-architecture(?)
- Split the monolithic file lock into lock domains—file handle/namespace, cache manager, cache entry, raw I/O
- Forbid callbacks or cross-cache re-entry while cache-entry locks are held; if an operation can re-enter, make it restartable
- Exit criteria: a stable documented lock order, deadlock-free eviction/flush/fault-injection runs, sanitizer-clean overnight stress, and clear evidence that the cache-enabled concurrent path is better than the earlier bypass path
- Begin with different files, then different datasets in one file when metadata side effects are isolated.
- Keep same-dataset writes serialized until there is a compelling design for disjoint-chunk writes that does not turn the metadata path into a maintenance nightmare.
-
Exit criteria: explicit visibility rules for readers, crash/fuzz coverage for concurrent mutation, and shutdown/cancellation semantics that are stronger than today’s documented requirement that applications join all HDF5-using threads before
H5closeor exit
- TBD
- Exit criteria: no hidden oversubscription disasters, wrapper layers that pass the same stress suite as the C core, and a decision to treat MPI+threads as its own program rather than a side effect of serial-library concurrency work
- First public concurrency release after Phase 5
- First serious performance release after Phase 7
- First polished mixed model after Phase 9.
This sequence is maintainable because it separates three risks instead of compounding them: boundary semantics first, narrow end-to-end value second, and the hard cache/mutation rearchitecture third.
- Correct/improve the plan (phases, matrices, etc.)
- Turn Phases 0–5(?) into a milestone sheet that includes package owners, lock-rank definitions, and exact fallback behavior for each connector/VFD/filter class.
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •
- Facilitator/time-keeper: Scot Breitenfeld
- Note-taker/Editor: AI/Scot Breitenfeld
- Add items here
None
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •