Releases: facebookresearch/spdl
v0.1.6
What's New
- Support custom merge operation by @mthrok, @moto-meta in #1012, #1047, #1045
Please check the documentation for usage https://facebookresearch.github.io/spdl/main/generated/spdl.pipeline.defs.Merge.html#spdl.pipeline.defs.Merge - Introduce config system to change default queue/hook/callbacks by @moto-meta in #1018, #1042, #1033
This allows to change the default Queue/TaskHook/ProfileHook so that teams/projects can use a standard one. This is particularly helpful to log the performance across organization without manually configuring it for each pipeline. https://facebookresearch.github.io/spdl/main/generated/spdl.pipeline.config.html
Bugfix
- Fix the return values of ordered pipe by @mthrok in #1024 (Thanks @EthanRosenthal for reporting #1023)
- Fix type annotation by @mthrok in #1062
What's Changed
- Remove type parameter from
AsyncQueueby @mthrok in #1021 - Removes source annotation from
PipelineConfigby @moto-meta in #1043 - Update Pipeline Queue annotation by @mthrok in #1025
- Throw if no input in profiling by @moto-meta in #1038
- Use
weakref.finalizeto clean up the Pipeline by @moto-meta in #1039
Examples
- Update benchmark scripts to save result to CSV by @moto-meta and @mthrok in #1017, #1019
Packaging
- Add missing 3.13t to Linux
aarch64build by @mthrok in #1014 - Lower to
manylinux_2_26foraarch64by @mthrok in #1015
Documentation Update
Internal / Refactor
- Move the type alias of callable to
defsby @mthrok in #1004 - Refactor 1 - Move queue and hook by @moto-meta in #1026
- Refactor 2 - Move node by @moto-meta in #1027
- Refactor 3 - Build logic by @moto-meta in #1031
- Refactor 4 - Make agg and disagg specialized config by @moto-meta in #1028
- Refactor 5 - Add _cmponents/init.py by @moto-meta in #1029
- Refactor 6 - Use defs/init.py by @moto-meta in #1032
- Refactor 7 - Refactor utilities by @moto-meta in #1030
- Update import path by @moto-meta in #1037
- Remove
op_require_eoffrom _PipeArgs by @moto-meta in #1040 - Set
pyre-strictin _node.py by @moto-meta in #1046 - Migrate pipeline_profiling_test to Python unittest by @moto-meta in #1044
- Migrate configs_test.py from pytest to unittest by @vbourgin in #1057
- Migrate video_frame_slice_test.py from pytest to unittest by @vbourgin in #1053
- Migrate zero_copy_bytes_passing_test.py from pytest to unittest by @vbourgin in #1050
- Migrate video_encoding_test.py from pytest to unittest by @vbourgin in #1049
- Migrate frames_test.py from pytest to unittest by @vbourgin in #1058
- Migrate audio_decoding_test.py from pytest to unittest by @vbourgin in #1055
- Migrate tar_test.py from pytest to unittest by @vbourgin in #1056
- Migrate demuxer_test.py from pytest to unittest by @vbourgin in #1052
- Migrate image_decoding_test.py from pytest to unittest by @vbourgin in #1048
- Migrate async_test.py from pytest to unittest by @vbourgin in #1051
- Migrate audio_encoding_test.py from pytest to unittest by @vbourgin in #1054
- Migrate buffer_conversion_refcount_test.py from pytest to unittest by @vbourgin in #1059
- Migrate streaming_decoding_test.py from pytest to unittest by @vbourgin in #1066
Full Changelog: v0.1.5...v0.1.6
v0.1.5
What's new
New Python version support
The spdl-io packages now include binaries for Python 3.13t, 3.14 and 3.14t on all supported platforms (macOS, Linux aarch64, Linux x86_64 and Windows).
They are now compiled with CUDA 12.8.1.
(spdl-core remains pure Python package without any dependency.)
Pipeline profiling and self-diagnostic mode
Pipeline profiling function
The spdl.pipeline.profile_pipeline function is added.
This function executes each stage separately with different multi-threading concurrency and reports the speed of executions. This helps find how each stage function scales with multi-threading.
Please check out the example for the usage and what you can infer from the profiling result.
Self-diagnostic mode
By setting environment variable SPDL_PIPELINE_DIAGNOSTIC_MODE=1, the Pipeline is built with self-diagnostic mode.
In this mode, internally it calls the profile_pipeline and exit.
This helps profiling the pipeline in production environment without changing the code.
(contributed by @ncorriveau and @mthrok)
Support for sub-pipelines
A new pipeline definition component, spdl.pipeline.defs.MergeConfig and its factory function spdl.pipeline.defs.Merge
have been added.
This allows to merge multiple pipelines and attach downstream stages.
Please check out the example for the usage.
spdl.io.load_wav for loading WAV from memory without copy
The spdl.io.load_wav function is added. This function is specialized for loading WAV audio from memory. It does not make any copy, so it is very fast. Please refer to the benchmark.
spdl.io.iter_tarfile for iterating TAR files
The spdl.io.iter_tarfile function is added. This function can iterate on TAR file in memory or file-like object. It is at least as fast as Python's built-in tarfile module. If the input is bytes type, then it can perform zero-copy parsing. (Proposed by @nicolas-dufour, @npuichigo [discussion])
Updates on failure monitoring mechanism
- You can now overwrite the
max_failuresat each stage. (Proposed by @EthanRosenthal #942)
Pass max_failures to spdl.pipeline.PipelineBuilder.pipe method, or spdl.pipeline.defs.Pipe function.
- Fix the way failed stages are reported (#1006)
When a pipeline stage fails more than it is allowed, the pipeline fails with error message. Previously all the stages were included in the message. Now, only the stages that actually failed are included.
What's Changed
-
zlib is statically linked in Linux/macOS binaries
Previously, thespdl_iopackages required an external installation ofzlibfor Linux and macOS, while it was statically linked in Windows packages. Nowzlibis also statically linked in Linux/macOS, making NumPy the only required dependency ofspdl-io,
New Contributors
- @ncorriveau made their first contribution in #957
Full Changelog: v0.1.4...v0.1.5
v0.1.4
New features
Config-based Pipeline constructions.
The newly introduced spdl.pipeline.defs module contains the definitions and helper functions you can use to build Pipeline object with declarative manner. For the detail, please refer to #902
You can now construct Pipeline in the following ways. Also please checkout the Hydra Integration Example.
| PipelineBuilder (existing) | PipelineDefinition (new) | Hydra |
builder = (
PipelineBuilder()
.add_source(Sampler())
.pipe(
funcA,
concurrency=...)
.pipe(
funcB,
concurrency=...,
executor=...)
.sink(...)
)
pipeline = builder.build(
num_threads=...) |
pdef = PipelineConfig(
src=Sampler(),
stages=[
PipeConfig(
funcA,
concurrency=...),
PipeConfig(
funcB,
concurrency=...,
executor=...),
],
sink=...
)
pipeline = build_pipeline(
pdef, num_threads=...) |
_target_: build_pipeline
num_threads: ...
definition:
_target_: PipelineConfig
src:
_target_: Sampler
stages:
- _target_: PipeConfig:
op: funcA
concurrency: ...
- _target_: PipeConfig:
op: funcB
concurrency: ...
executor: ...
sink:
... |
[Experimental] Windows Support
SPDL I/O now supports building on Windows. The binary distribution contains CUDA integration and NVDEC support.
Logging change
Previously, if a function passed to a pipe fails, the error was logged in one line. Now the full stack trace is printed.
Bug fix
What's Changed
v0.1.3
New features
- Support clases with
__getitem__method in pipe by @mthrok in #872 - Add VideoPackets.get_timestamps method by @mthrok in #875
- Add VideoFrames::get_timestamp method by @mthrok in #877
- Add VideoFrames.get_pts / time_base attributes to VideoFrames by @mthrok in #883
- Add support for FFmpeg 8 #869
Bug fix
- Fix the handling of StopAsyncIterator for FailCounter by @moto-meta in #862
- Fix include by @mthrok in #874
- Correctly detect that callable is a generator when it is a class's call method by @yit-b in #870
- Fix type stub by @mthrok in #884
What's Changed
- Update imagenet example by @mthrok in #860
- Add debug print for pipeline structure by @mthrok in #868
- Define a dedicated type alias for pipe input by @mthrok in #885
Full Changelog: v0.1.2...v0.1.3
v0.1.2
New features
- Support
__len__for SPDL dataloader by @vbourgin in #839 - Add
SizedIterable[WithShuffle]protocol by @mthrok in #840 - Move DistributedSampler to SPDL core by @vbourgin in #841
- Move
load_npyto C++ by @mthrok in #849 - Use libdeflate for loading compressed NumPy Z file. by @mthrok in #855
- Add
transfer_tensorfunction by @mthrok in #853
What's Changed
- Release the GIL when flushing decoder by @mthrok in #828
- Fix
embed_shuffleby @moto-meta in #857
Documentation updates
BC-breaking changes
Full Changelog: v0.1.1...v0.1.2
v0.1.1
BC-breaking changes
-
Replace
run_pipeline_in_subprocessimpl by @moto-meta in #780Previously
run_pipeline_in_subprocessreturned an iterator and was not reusable.
Now it returns an iterable, which can be iterated multiple times.
Following this change, the object returned fromrun_pipeline_in_subprocessdoes not provide iterator-specific method such asnextExample
# Construct a builder builder = ( spdl.pipeline.PipelineBuilder() .add_source(...) .pipe(...) ... .add_sink(...) ) # Move it to the subprocess, build the Pipeline iterable = run_pipeline_in_subprocess(builder, ...) # Iterate - epoch 0 for item in iterable: ... # Iterate - epoch 1 for item in iterable: ...
What's Changed
- Support multiple initializers in
iterate_in_subprocessandrun_pipeline_insubprocessby @mthrok in #783
Now you can pass multiple initializer functions toiterate_in_subprocessandrun_pipeline_insubprocesswhich are called in the subprocess before the first iteration starts. - Allow Zero Weights in MergeIterator by @vbourgin in #784
Previously, theMergeIteratordid not allow 0 value as weights. Now this is relaxed and you can pass 0.
The iterators with 0 weight are not iterated. - Shuffle Before Iterating by Default for
embed_shuffleby @vbourgin in #785
Previously,embed_shuffledefaulted to shuffle after each iteration, but this was counter-intuitive.
Now the default behavior is to shuffle before each iteration. - Adhere to PEP 561 (Fixes a mypy bug) by @alxmrs in #790
Now SPDL package containpy.typedfile so that type-checkers can analyze the code. - Ensure core bindings is loaded when loading CUDA binding by @mthrok in #792
When loading CUDA extension inspdl.iomodule, it ensures that the CPU extension is loaded. - Support batch loading images with NVJPEG by @mthrok in #794
Thespdl.io.decode_image_nvjpegfunction supports loading multiple images. (Note that the function is still experimental.) - Enable compilation warnings by @mthrok in #806, #821
The C++ code ofspdl.iois hardened by turning few selected compiler warnings into error. - Fix nanobind option for archive module by @mthrok in #814
The extension module for archive (zip) parsing was not compiled with free-threading support. - Mypy type-checking feedback for all "pyre safe" sources. by @alxmrs in #801
Now CImypytype checking. Themypycompatibility is not enforced at the moment aspyrehas been (and still is) used. We plan to gradually make the codebase compatible withmypy.
New Contributors
Full Changelog: v0.1.0...v0.1.1
v0.1.0
What's Changed
- [BC-breaking] Replace load_npz implementation by @mthrok in #739
- [BC-breaking] Remove fallback for API transition by @mthrok in #770
- [BC-breaking] Replace iterate_in_subprocess implementation by #776 @mthrok in #779
- Add support for DEFLATE algorithm by @mthrok in #744
- Name threads by @mthrok in #728
- Update
load_npyto accept memoryview by @mthrok in #738 - Remove libzip dependency by @mthrok in #746
- Default to the default context by @mthrok in #762
- Revise the way periodic task is terminated by @mthrok in #768
- Add the funtionality to suppress repeated errors in spdl by @gregorpm in #773
- Add function to convert IterableWithShuffle to Iterable by @mthrok in #775, #777
Examples
Docs
- Add section on data format by @mthrok in #729
- Introduce Case Studies by @mthrok in #731, #741, #754, #755
- Add caveats and update links by @mthrok in #752, #751
Full Changelog: v0.0.14...v0.1.0
v0.0.14
Documentation Update
- Improved the documentation.
The SPDL was started as a experiment for multi-threading, but now it's not just multi-threading but also multi-processing.
It's more about production readiness, iterative optimization enabled by observability and flexibility.
The Overview and Getting Started have been updated to reflect this - Optimization Guide
https://facebookresearch.github.io/spdl/0.0.14/performance_analysis/index.html
In the new Optimization Guide section, we share the methodologies we develop while optimizing various production pipelines.
It illustrates how we estimate the data loading inefficiency, identify the bottleneck, and how we approach solving the bottleneck.
BC-breaking Changes
-
Propagate severe Pipeline errors to front-end by @mthrok in #699
WhenPipelineencounters an unexpected error, in addition to shutting down the pipeline,
the error is now propagated to the foreground.The unexpected error includes an error occurred in source iterator.
The errors happen in functions provided by users toPipeline.pipeare expected, so they are not part of it.
Bugfixes
Other Changes
- Increase the pipe output buffer size to 2 by @mthrok in #700
- Do not auto-populate
stop_afterby @mthrok in #702 - Make disaggregate to async by @mthrok in #703
Full Changelog: v0.0.13...v0.0.14
v0.0.13 - Observability Improvment
Better observability towards bottleneck detection
In this release, we re-organized the pipeline hooks, and expose the periodic performance measurement mechanism as public interface.
This allows to tap into the performance statistics of data processing pipeline, which makes it easy to analyze the pipeline performance, and optimize the pipeline.
Please checkout the renewed Performance Analysis documentation, and example to learn how to log the runtime performance statistics.
BC-breaking Changes
- Merge QueueClass and hook creation interface by @moto-meta in #691
New Features
- Expose stats logging as public API by @mthrok in #686
- Add performance analysis example by @mthrok in #694
- Add performance analysis docs by @mthrok in #693
Full Changelog: v0.0.12...v0.0.13
v0.0.12
New features
- Improved streaming media processing capability, including multi-stream demuxing, remuxing, encoding and complex filter graph capability.
- You can demux multiple streams from the source. (such as audio and video packets.)
- With the new
Muxerclass, you can encode media or remux packets. - Please check out the
streaming_video_processing.pyfor audio remuxing, streaming video decoding, and encoding.
- The new
FilterGraphclass can perform complex filtering, multiple-input, multiple-output and multi-media processing. One popular example isoverlayto add watermark on videos processed by AI. Please checkout the documentation.
What's Changed
- Rename BSF implementation by @mthrok in #631
- Add tests for codec by @mthrok in #632
- Add BSF class by @mthrok in #634
- Replace the implementation of BSF for nvdec by @mthrok in #635
- Add method to query stream index by @mthrok in #636
- Support multi-stream demuxing by @mthrok in #637
- Add duration option to multi-stream demuxing by @mthrok in #638
- Do not run packaging jobs on main branch after merge by @mthrok in #644
- Add channel_layout/sample_aspect_ratio attributes by @mthrok in #640
- Tweak test util for consistency by @mthrok in #641
- Add default stream by @mthrok in #642
- Returns None if there is no packet/frames by @mthrok in #643
- Add filter graph public class by @mthrok in #639
- Fix issues on encoding by @mthrok in #646
- Improve error message by @mthrok in #647
- Add audio encode config by @mthrok in #648
- Add video encode config by @mthrok in #649
- Add muxer by @mthrok in #645
- Add audio encoding by @mthrok in #650
- Add video encoding by @mthrok in #651
- Deprecate streaming_demux_video by @mthrok in #653
- Remove obsolete function by @mthrok in #652
- Return packets directly when stream is specified with single int by @mthrok in #654
- Add streaming video processing example by @mthrok in #655
- Update mermaid js version by @mthrok in #658
- Do not run lint job on master branch by @mthrok in #659
- Set cancellation by @mthrok in #660
- Fix doc by @mthrok in #661
- Bump version by @mthrok in #662
- Update doc by @mthrok in #663
- [BC-Breaking] Replace encode_image with save_image by @mthrok in #656
- Merge the internal implementation of FilterGraph by @mthrok in #657
- Get rid of custom source and use existing url attr by @mthrok in #664
- Remove wheel from build dep by @mthrok in #665
- Retrieve proper time base from BSF by @mthrok in #666
- Ensure no new Python download by @mthrok in #667
- Merge frame_rate impl by @mthrok in #668
- Improve the error message when using subprocess by @mthrok in #669
- Match the signature default value to usecase by @mthrok in #670
- Add pipeline ID to stage names by @mthrok in #671
- Add queue occupation rate log by @moto-meta in #672
- Update docs by @mthrok in #673
- Default to build nvdec by @mthrok in #677
- Set CMAKE_BUILD_TYPE=Release by @mthrok in #676
Full Changelog: v0.0.11...v0.0.12