28 Oct 18:43

mthrok

b10f186

v0.1.6 Latest

Latest

What's New

Support custom merge operation by @mthrok, @moto-meta in #1012, #1047, #1045
Please check the documentation for usage https://facebookresearch.github.io/spdl/main/generated/spdl.pipeline.defs.Merge.html#spdl.pipeline.defs.Merge
Introduce config system to change default queue/hook/callbacks by @moto-meta in #1018, #1042, #1033
This allows to change the default Queue/TaskHook/ProfileHook so that teams/projects can use a standard one. This is particularly helpful to log the performance across organization without manually configuring it for each pipeline. https://facebookresearch.github.io/spdl/main/generated/spdl.pipeline.config.html

Bugfix

Fix the return values of ordered pipe by @mthrok in #1024 (Thanks @EthanRosenthal for reporting #1023)
Fix type annotation by @mthrok in #1062

What's Changed

Remove type parameter from AsyncQueue by @mthrok in #1021
Removes source annotation from PipelineConfig by @moto-meta in #1043
Update Pipeline Queue annotation by @mthrok in #1025
Throw if no input in profiling by @moto-meta in #1038
Use weakref.finalize to clean up the Pipeline by @moto-meta in #1039

Examples

Update benchmark scripts to save result to CSV by @moto-meta and @mthrok in #1017, #1019

Packaging

Add missing 3.13t to Linux aarch64 build by @mthrok in #1014
Lower to manylinux_2_26 for aarch64 by @mthrok in #1015

Documentation Update

Mention diagnostic mode in profile_pipeline function by @mthrok in #1013

Internal / Refactor

Move the type alias of callable to defs by @mthrok in #1004
Refactor 1 - Move queue and hook by @moto-meta in #1026
Refactor 2 - Move node by @moto-meta in #1027
Refactor 3 - Build logic by @moto-meta in #1031
Refactor 4 - Make agg and disagg specialized config by @moto-meta in #1028
Refactor 5 - Add _cmponents/init.py by @moto-meta in #1029
Refactor 6 - Use defs/init.py by @moto-meta in #1032
Refactor 7 - Refactor utilities by @moto-meta in #1030
Update import path by @moto-meta in #1037
Remove op_require_eof from _PipeArgs by @moto-meta in #1040
Set pyre-strict in _node.py by @moto-meta in #1046
Migrate pipeline_profiling_test to Python unittest by @moto-meta in #1044
Migrate configs_test.py from pytest to unittest by @vbourgin in #1057
Migrate video_frame_slice_test.py from pytest to unittest by @vbourgin in #1053
Migrate zero_copy_bytes_passing_test.py from pytest to unittest by @vbourgin in #1050
Migrate video_encoding_test.py from pytest to unittest by @vbourgin in #1049
Migrate frames_test.py from pytest to unittest by @vbourgin in #1058
Migrate audio_decoding_test.py from pytest to unittest by @vbourgin in #1055
Migrate tar_test.py from pytest to unittest by @vbourgin in #1056
Migrate demuxer_test.py from pytest to unittest by @vbourgin in #1052
Migrate image_decoding_test.py from pytest to unittest by @vbourgin in #1048
Migrate async_test.py from pytest to unittest by @vbourgin in #1051
Migrate audio_encoding_test.py from pytest to unittest by @vbourgin in #1054
Migrate buffer_conversion_refcount_test.py from pytest to unittest by @vbourgin in #1059
Migrate streaming_decoding_test.py from pytest to unittest by @vbourgin in #1066

Full Changelog: v0.1.5...v0.1.6

Contributors

mthrok, EthanRosenthal, and 2 other contributors

Assets 2

15 Oct 14:24

mthrok

v0.1.5

09a52e9

v0.1.5

What's new

New Python version support

The spdl-io packages now include binaries for Python 3.13t, 3.14 and 3.14t on all supported platforms (macOS, Linux aarch64, Linux x86_64 and Windows).
They are now compiled with CUDA 12.8.1.

(spdl-core remains pure Python package without any dependency.)

Pipeline profiling and self-diagnostic mode

Pipeline profiling function
The spdl.pipeline.profile_pipeline function is added.
This function executes each stage separately with different multi-threading concurrency and reports the speed of executions. This helps find how each stage function scales with multi-threading.
Please check out the example for the usage and what you can infer from the profiling result.

Self-diagnostic mode
By setting environment variable SPDL_PIPELINE_DIAGNOSTIC_MODE=1, the Pipeline is built with self-diagnostic mode.
In this mode, internally it calls the profile_pipeline and exit.
This helps profiling the pipeline in production environment without changing the code.

(contributed by @ncorriveau and @mthrok)

Support for sub-pipelines

A new pipeline definition component, spdl.pipeline.defs.MergeConfig and its factory function spdl.pipeline.defs.Merge
have been added.
This allows to merge multiple pipelines and attach downstream stages.
Please check out the example for the usage.

`spdl.io.load_wav` for loading WAV from memory without copy

The spdl.io.load_wav function is added. This function is specialized for loading WAV audio from memory. It does not make any copy, so it is very fast. Please refer to the benchmark.

`spdl.io.iter_tarfile` for iterating TAR files

The spdl.io.iter_tarfile function is added. This function can iterate on TAR file in memory or file-like object. It is at least as fast as Python's built-in tarfile module. If the input is bytes type, then it can perform zero-copy parsing. (Proposed by @nicolas-dufour, @npuichigo [discussion])

Updates on failure monitoring mechanism

You can now overwrite the max_failures at each stage. (Proposed by @EthanRosenthal #942)

Pass max_failures to spdl.pipeline.PipelineBuilder.pipe method, or spdl.pipeline.defs.Pipe function.

Fix the way failed stages are reported (#1006)
When a pipeline stage fails more than it is allowed, the pipeline fails with error message. Previously all the stages were included in the message. Now, only the stages that actually failed are included.

What's Changed

zlib is statically linked in Linux/macOS binaries
Previously, the spdl_io packages required an external installation of zlib for Linux and macOS, while it was statically linked in Windows packages. Now zlib is also statically linked in Linux/macOS, making NumPy the only required dependency of spdl-io,
Pipeline definitions are frozen by @mthrok in #966

New Contributors

@ncorriveau made their first contribution in #957

Full Changelog: v0.1.4...v0.1.5

Contributors

mthrok, EthanRosenthal, and 3 other contributors

Assets 2

10 Sep 19:37

mthrok

v0.1.4

a7cb787

v0.1.4

New features

Config-based Pipeline constructions.

The newly introduced spdl.pipeline.defs module contains the definitions and helper functions you can use to build Pipeline object with declarative manner. For the detail, please refer to #902

You can now construct Pipeline in the following ways. Also please checkout the Hydra Integration Example.

PipelineBuilder (existing)

PipelineDefinition (new)

Hydra

builder = (
    PipelineBuilder()
    .add_source(Sampler())
    .pipe(
        funcA,
        concurrency=...)
    .pipe(
        funcB,
        concurrency=...,
        executor=...)
    .sink(...)
)
pipeline = builder.build(
    num_threads=...)

pdef = PipelineConfig(
    src=Sampler(),
    stages=[
        PipeConfig(
            funcA,
            concurrency=...),
        PipeConfig(
            funcB,
            concurrency=...,
            executor=...),
    ],
    sink=...
)
pipeline = build_pipeline(
    pdef, num_threads=...)

_target_: build_pipeline
num_threads: ...
definition:
  _target_: PipelineConfig
  src:
    _target_: Sampler
  stages:
  - _target_: PipeConfig:
    op: funcA
    concurrency: ...
  - _target_: PipeConfig:
    op: funcB
    concurrency: ...
    executor: ...
  sink:
    ...

[Experimental] Windows Support

SPDL I/O now supports building on Windows. The binary distribution contains CUDA integration and NVDEC support.

Logging change

Previously, if a function passed to a pipe fails, the error was logged in one line. Now the full stack trace is printed.

Update Pipeline failure logging by @mthrok in #933

Bug fix

Await cancelled tasks by @mthrok in #906
Fix build for ffmpeg8 + CUDA by @mthrok in #917

What's Changed

Remove gflags dependency by @mthrok in #912, #913
Update glog version to 6.0 by @mthrok in #914, #915, #916

Contributors

mthrok

Assets 2

01 Sep 15:21

mthrok

v0.1.3

0479641

v0.1.3

New features

Support clases with __getitem__ method in pipe by @mthrok in #872
Add VideoPackets.get_timestamps method by @mthrok in #875
Add VideoFrames::get_timestamp method by @mthrok in #877
Add VideoFrames.get_pts / time_base attributes to VideoFrames by @mthrok in #883
Add support for FFmpeg 8 #869

Bug fix

Fix the handling of StopAsyncIterator for FailCounter by @moto-meta in #862
Fix include by @mthrok in #874
Correctly detect that callable is a generator when it is a class's call method by @yit-b in #870
Fix type stub by @mthrok in #884

What's Changed

Update imagenet example by @mthrok in #860
Add debug print for pipeline structure by @mthrok in #868
Define a dedicated type alias for pipe input by @mthrok in #885

Full Changelog: v0.1.2...v0.1.3

Contributors

mthrok, yit-b, and moto-meta

Assets 2

21 Aug 14:27

mthrok

v0.1.2

55b6156

v0.1.2

New features

Support __len__ for SPDL dataloader by @vbourgin in #839
Add SizedIterable[WithShuffle] protocol by @mthrok in #840
Move DistributedSampler to SPDL core by @vbourgin in #841
Move load_npy to C++ by @mthrok in #849
Use libdeflate for loading compressed NumPy Z file. by @mthrok in #855
Add transfer_tensor function by @mthrok in #853

What's Changed

Release the GIL when flushing decoder by @mthrok in #828
Fix embed_shuffle by @moto-meta in #857

Documentation updates

Add intro to async doc by @mthrok in #834

BC-breaking changes

Remove deprecated feature by @mthrok in #832

Full Changelog: v0.1.1...v0.1.2

Contributors

mthrok, vbourgin, and moto-meta

Assets 2

01 Jul 10:43

mthrok

v0.1.1

e3286d2

v0.1.1

BC-breaking changes

Replace run_pipeline_in_subprocess impl by @moto-meta in #780

Previously run_pipeline_in_subprocess returned an iterator and was not reusable.
Now it returns an iterable, which can be iterated multiple times.
Following this change, the object returned from run_pipeline_in_subprocess does not provide iterator-specific method such as next

Example

# Construct a builder
builder = (
    spdl.pipeline.PipelineBuilder()
    .add_source(...)
    .pipe(...)
    ...
    .add_sink(...)
)

# Move it to the subprocess, build the Pipeline
iterable = run_pipeline_in_subprocess(builder, ...)

# Iterate - epoch 0
for item in iterable:
    ...

# Iterate - epoch 1
for item in iterable:
    ...

What's Changed

Support multiple initializers in iterate_in_subprocess and run_pipeline_insubprocess by @mthrok in #783
Now you can pass multiple initializer functions to iterate_in_subprocess and run_pipeline_insubprocess which are called in the subprocess before the first iteration starts.
Allow Zero Weights in MergeIterator by @vbourgin in #784
Previously, the MergeIterator did not allow 0 value as weights. Now this is relaxed and you can pass 0.
The iterators with 0 weight are not iterated.
Shuffle Before Iterating by Default for embed_shuffle by @vbourgin in #785
Previously, embed_shuffle defaulted to shuffle after each iteration, but this was counter-intuitive.
Now the default behavior is to shuffle before each iteration.
Adhere to PEP 561 (Fixes a mypy bug) by @alxmrs in #790
Now SPDL package contain py.typed file so that type-checkers can analyze the code.
Ensure core bindings is loaded when loading CUDA binding by @mthrok in #792
When loading CUDA extension in spdl.io module, it ensures that the CPU extension is loaded.
Support batch loading images with NVJPEG by @mthrok in #794
The spdl.io.decode_image_nvjpeg function supports loading multiple images. (Note that the function is still experimental.)
Enable compilation warnings by @mthrok in #806, #821
The C++ code of spdl.io is hardened by turning few selected compiler warnings into error.
Fix nanobind option for archive module by @mthrok in #814
The extension module for archive (zip) parsing was not compiled with free-threading support.
Mypy type-checking feedback for all "pyre safe" sources. by @alxmrs in #801
Now CI mypy type checking. The mypy compatibility is not enforced at the moment as pyre has been (and still is) used. We plan to gradually make the codebase compatible with mypy.

New Contributors

@alxmrs made their first contribution in #790

Full Changelog: v0.1.0...v0.1.1

Contributors

mthrok, alxmrs, and 2 other contributors

Assets 2

10 Jun 22:50

mthrok

v0.1.0

7e0f36f

v0.1.0

What's Changed

[BC-breaking] Replace load_npz implementation by @mthrok in #739
[BC-breaking] Remove fallback for API transition by @mthrok in #770
[BC-breaking] Replace iterate_in_subprocess implementation by #776 @mthrok in #779
Add support for DEFLATE algorithm by @mthrok in #744
Name threads by @mthrok in #728
Update load_npy to accept memoryview by @mthrok in #738
Remove libzip dependency by @mthrok in #746
Default to the default context by @mthrok in #762
Revise the way periodic task is terminated by @mthrok in #768
Add the funtionality to suppress repeated errors in spdl by @gregorpm in #773
Add function to convert IterableWithShuffle to Iterable by @mthrok in #775, #777

Examples

Add example to benchmark IO functions by @mthrok in #730

Docs

Add section on data format by @mthrok in #729
Introduce Case Studies by @mthrok in #731, #741, #754, #755
Add caveats and update links by @mthrok in #752, #751

Full Changelog: v0.0.14...v0.1.0

Contributors

mthrok and gregorpm

Assets 2

12 May 15:32

mthrok

v0.0.14

b9a01c7

v0.0.14

Documentation Update

Improved the documentation.
The SPDL was started as a experiment for multi-threading, but now it's not just multi-threading but also multi-processing.
It's more about production readiness, iterative optimization enabled by observability and flexibility.
The Overview and Getting Started have been updated to reflect this
Optimization Guide
https://facebookresearch.github.io/spdl/0.0.14/performance_analysis/index.html
In the new Optimization Guide section, we share the methodologies we develop while optimizing various production pipelines.
It illustrates how we estimate the data loading inefficiency, identify the bottleneck, and how we approach solving the bottleneck.

BC-breaking Changes

Propagate severe Pipeline errors to front-end by @mthrok in #699
When Pipeline encounters an unexpected error, in addition to shutting down the pipeline,
the error is now propagated to the foreground.

The unexpected error includes an error occurred in source iterator.
The errors happen in functions provided by users to Pipeline.pipe are expected, so they are not part of it.

Bugfixes

Fix BC function name by @mthrok in #701

Other Changes

Increase the pipe output buffer size to 2 by @mthrok in #700
Do not auto-populate stop_after by @mthrok in #702
Make disaggregate to async by @mthrok in #703

Full Changelog: v0.0.13...v0.0.14

Contributors

mthrok

Assets 2

02 May 14:32

mthrok

v0.0.13

fe75c39

v0.0.13 - Observability Improvment

Better observability towards bottleneck detection

In this release, we re-organized the pipeline hooks, and expose the periodic performance measurement mechanism as public interface.
This allows to tap into the performance statistics of data processing pipeline, which makes it easy to analyze the pipeline performance, and optimize the pipeline.

Please checkout the renewed Performance Analysis documentation, and example to learn how to log the runtime performance statistics.

BC-breaking Changes

Merge QueueClass and hook creation interface by @moto-meta in #691

New Features

Expose stats logging as public API by @mthrok in #686
Add performance analysis example by @mthrok in #694
Add performance analysis docs by @mthrok in #693

Full Changelog: v0.0.12...v0.0.13

Contributors

mthrok and moto-meta

Assets 2

23 Apr 15:40

mthrok

v0.0.12

a4fe619

v0.0.12

New features

Improved streaming media processing capability, including multi-stream demuxing, remuxing, encoding and complex filter graph capability.
- You can demux multiple streams from the source. (such as audio and video packets.)
- With the new Muxer class, you can encode media or remux packets.
- Please check out the streaming_video_processing.py for audio remuxing, streaming video decoding, and encoding.
The new FilterGraph class can perform complex filtering, multiple-input, multiple-output and multi-media processing. One popular example is overlay to add watermark on videos processed by AI. Please checkout the documentation.

What's Changed

Rename BSF implementation by @mthrok in #631
Add tests for codec by @mthrok in #632
Add BSF class by @mthrok in #634
Replace the implementation of BSF for nvdec by @mthrok in #635
Add method to query stream index by @mthrok in #636
Support multi-stream demuxing by @mthrok in #637
Add duration option to multi-stream demuxing by @mthrok in #638
Do not run packaging jobs on main branch after merge by @mthrok in #644
Add channel_layout/sample_aspect_ratio attributes by @mthrok in #640
Tweak test util for consistency by @mthrok in #641
Add default stream by @mthrok in #642
Returns None if there is no packet/frames by @mthrok in #643
Add filter graph public class by @mthrok in #639
Fix issues on encoding by @mthrok in #646
Improve error message by @mthrok in #647
Add audio encode config by @mthrok in #648
Add video encode config by @mthrok in #649
Add muxer by @mthrok in #645
Add audio encoding by @mthrok in #650
Add video encoding by @mthrok in #651
Deprecate streaming_demux_video by @mthrok in #653
Remove obsolete function by @mthrok in #652
Return packets directly when stream is specified with single int by @mthrok in #654
Add streaming video processing example by @mthrok in #655
Update mermaid js version by @mthrok in #658
Do not run lint job on master branch by @mthrok in #659
Set cancellation by @mthrok in #660
Fix doc by @mthrok in #661
Bump version by @mthrok in #662
Update doc by @mthrok in #663
[BC-Breaking] Replace encode_image with save_image by @mthrok in #656
Merge the internal implementation of FilterGraph by @mthrok in #657
Get rid of custom source and use existing url attr by @mthrok in #664
Remove wheel from build dep by @mthrok in #665
Retrieve proper time base from BSF by @mthrok in #666
Ensure no new Python download by @mthrok in #667
Merge frame_rate impl by @mthrok in #668
Improve the error message when using subprocess by @mthrok in #669
Match the signature default value to usecase by @mthrok in #670
Add pipeline ID to stage names by @mthrok in #671
Add queue occupation rate log by @moto-meta in #672
Update docs by @mthrok in #673
Default to build nvdec by @mthrok in #677
Set CMAKE_BUILD_TYPE=Release by @mthrok in #676

Full Changelog: v0.0.11...v0.0.12

Contributors

mthrok and moto-meta

Assets 2

Releases: facebookresearch/spdl

v0.1.6

What's New

Bugfix

What's Changed

Examples

Packaging

Documentation Update

Internal / Refactor

Contributors

Uh oh!

v0.1.5

What's new

New Python version support

Pipeline profiling and self-diagnostic mode

Support for sub-pipelines

spdl.io.load_wav for loading WAV from memory without copy

spdl.io.iter_tarfile for iterating TAR files

Updates on failure monitoring mechanism

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.4

New features

Config-based Pipeline constructions.

[Experimental] Windows Support

Logging change

Bug fix

What's Changed

Contributors

Uh oh!

v0.1.3

New features

Bug fix

What's Changed

Contributors

Uh oh!

v0.1.2

New features

What's Changed

Documentation updates

BC-breaking changes

Contributors

Uh oh!

v0.1.1

BC-breaking changes

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

What's Changed

Examples

Docs

Contributors

Uh oh!

v0.0.14

Documentation Update

BC-breaking Changes

Bugfixes

Other Changes

Contributors

Uh oh!

v0.0.13 - Observability Improvment

Better observability towards bottleneck detection

BC-breaking Changes

New Features

Contributors

Uh oh!

v0.0.12

New features

What's Changed

Contributors

Uh oh!

`spdl.io.load_wav` for loading WAV from memory without copy

`spdl.io.iter_tarfile` for iterating TAR files