Skip to content

Releases: facebookresearch/spdl

v0.1.6

28 Oct 18:43
b10f186

Choose a tag to compare

What's New

Bugfix

What's Changed

Examples

Packaging

Documentation Update

  • Mention diagnostic mode in profile_pipeline function by @mthrok in #1013

Internal / Refactor

Full Changelog: v0.1.5...v0.1.6

v0.1.5

15 Oct 14:24
09a52e9

Choose a tag to compare

What's new

New Python version support

The spdl-io packages now include binaries for Python 3.13t, 3.14 and 3.14t on all supported platforms (macOS, Linux aarch64, Linux x86_64 and Windows).
They are now compiled with CUDA 12.8.1.

(spdl-core remains pure Python package without any dependency.)

Pipeline profiling and self-diagnostic mode

Pipeline profiling function
The spdl.pipeline.profile_pipeline function is added.
This function executes each stage separately with different multi-threading concurrency and reports the speed of executions. This helps find how each stage function scales with multi-threading.
Please check out the example for the usage and what you can infer from the profiling result.

Self-diagnostic mode
By setting environment variable SPDL_PIPELINE_DIAGNOSTIC_MODE=1, the Pipeline is built with self-diagnostic mode.
In this mode, internally it calls the profile_pipeline and exit.
This helps profiling the pipeline in production environment without changing the code.

(contributed by @ncorriveau and @mthrok)

Support for sub-pipelines

A new pipeline definition component, spdl.pipeline.defs.MergeConfig and its factory function spdl.pipeline.defs.Merge
have been added.
This allows to merge multiple pipelines and attach downstream stages.
Please check out the example for the usage.

spdl.io.load_wav for loading WAV from memory without copy

The spdl.io.load_wav function is added. This function is specialized for loading WAV audio from memory. It does not make any copy, so it is very fast. Please refer to the benchmark.

spdl.io.iter_tarfile for iterating TAR files

The spdl.io.iter_tarfile function is added. This function can iterate on TAR file in memory or file-like object. It is at least as fast as Python's built-in tarfile module. If the input is bytes type, then it can perform zero-copy parsing. (Proposed by @nicolas-dufour, @npuichigo [discussion])

Updates on failure monitoring mechanism

Pass max_failures to spdl.pipeline.PipelineBuilder.pipe method, or spdl.pipeline.defs.Pipe function.

  • Fix the way failed stages are reported (#1006)
    When a pipeline stage fails more than it is allowed, the pipeline fails with error message. Previously all the stages were included in the message. Now, only the stages that actually failed are included.

What's Changed

  • zlib is statically linked in Linux/macOS binaries
    Previously, the spdl_io packages required an external installation of zlib for Linux and macOS, while it was statically linked in Windows packages. Now zlib is also statically linked in Linux/macOS, making NumPy the only required dependency of spdl-io,

  • Pipeline definitions are frozen by @mthrok in #966

New Contributors

Full Changelog: v0.1.4...v0.1.5

v0.1.4

10 Sep 19:37
a7cb787

Choose a tag to compare

New features

Config-based Pipeline constructions.

The newly introduced spdl.pipeline.defs module contains the definitions and helper functions you can use to build Pipeline object with declarative manner. For the detail, please refer to #902

You can now construct Pipeline in the following ways. Also please checkout the Hydra Integration Example.

PipelineBuilder (existing) PipelineDefinition (new) Hydra
builder = (
    PipelineBuilder()
    .add_source(Sampler())
    .pipe(
        funcA,
        concurrency=...)
    .pipe(
        funcB,
        concurrency=...,
        executor=...)
    .sink(...)
)
pipeline = builder.build(
    num_threads=...)
pdef = PipelineConfig(
    src=Sampler(),
    stages=[
        PipeConfig(
            funcA,
            concurrency=...),
        PipeConfig(
            funcB,
            concurrency=...,
            executor=...),
    ],
    sink=...
)
pipeline = build_pipeline(
    pdef, num_threads=...)
_target_: build_pipeline
num_threads: ...
definition:
  _target_: PipelineConfig
  src:
    _target_: Sampler
  stages:
  - _target_: PipeConfig:
    op: funcA
    concurrency: ...
  - _target_: PipeConfig:
    op: funcB
    concurrency: ...
    executor: ...
  sink:
    ...

[Experimental] Windows Support

SPDL I/O now supports building on Windows. The binary distribution contains CUDA integration and NVDEC support.

Logging change

Previously, if a function passed to a pipe fails, the error was logged in one line. Now the full stack trace is printed.

Bug fix

What's Changed

v0.1.3

01 Sep 15:21
0479641

Choose a tag to compare

New features

  • Support clases with __getitem__ method in pipe by @mthrok in #872
  • Add VideoPackets.get_timestamps method by @mthrok in #875
  • Add VideoFrames::get_timestamp method by @mthrok in #877
  • Add VideoFrames.get_pts / time_base attributes to VideoFrames by @mthrok in #883
  • Add support for FFmpeg 8 #869

Bug fix

  • Fix the handling of StopAsyncIterator for FailCounter by @moto-meta in #862
  • Fix include by @mthrok in #874
  • Correctly detect that callable is a generator when it is a class's call method by @yit-b in #870
  • Fix type stub by @mthrok in #884

What's Changed

  • Update imagenet example by @mthrok in #860
  • Add debug print for pipeline structure by @mthrok in #868
  • Define a dedicated type alias for pipe input by @mthrok in #885

Full Changelog: v0.1.2...v0.1.3

v0.1.2

21 Aug 14:27
55b6156

Choose a tag to compare

New features

What's Changed

Documentation updates

BC-breaking changes

Full Changelog: v0.1.1...v0.1.2

v0.1.1

01 Jul 10:43
e3286d2

Choose a tag to compare

BC-breaking changes

  • Replace run_pipeline_in_subprocess impl by @moto-meta in #780

    Previously run_pipeline_in_subprocess returned an iterator and was not reusable.
    Now it returns an iterable, which can be iterated multiple times.
    Following this change, the object returned from run_pipeline_in_subprocess does not provide iterator-specific method such as next

    Example
    # Construct a builder
    builder = (
        spdl.pipeline.PipelineBuilder()
        .add_source(...)
        .pipe(...)
        ...
        .add_sink(...)
    )
    
    # Move it to the subprocess, build the Pipeline
    iterable = run_pipeline_in_subprocess(builder, ...)
    
    # Iterate - epoch 0
    for item in iterable:
        ...
    
    # Iterate - epoch 1
    for item in iterable:
        ...

What's Changed

  • Support multiple initializers in iterate_in_subprocess and run_pipeline_insubprocess by @mthrok in #783
    Now you can pass multiple initializer functions to iterate_in_subprocess and run_pipeline_insubprocess which are called in the subprocess before the first iteration starts.
  • Allow Zero Weights in MergeIterator by @vbourgin in #784
    Previously, the MergeIterator did not allow 0 value as weights. Now this is relaxed and you can pass 0.
    The iterators with 0 weight are not iterated.
  • Shuffle Before Iterating by Default for embed_shuffle by @vbourgin in #785
    Previously, embed_shuffle defaulted to shuffle after each iteration, but this was counter-intuitive.
    Now the default behavior is to shuffle before each iteration.
  • Adhere to PEP 561 (Fixes a mypy bug) by @alxmrs in #790
    Now SPDL package contain py.typed file so that type-checkers can analyze the code.
  • Ensure core bindings is loaded when loading CUDA binding by @mthrok in #792
    When loading CUDA extension in spdl.io module, it ensures that the CPU extension is loaded.
  • Support batch loading images with NVJPEG by @mthrok in #794
    The spdl.io.decode_image_nvjpeg function supports loading multiple images. (Note that the function is still experimental.)
  • Enable compilation warnings by @mthrok in #806, #821
    The C++ code of spdl.io is hardened by turning few selected compiler warnings into error.
  • Fix nanobind option for archive module by @mthrok in #814
    The extension module for archive (zip) parsing was not compiled with free-threading support.
  • Mypy type-checking feedback for all "pyre safe" sources. by @alxmrs in #801
    Now CI mypy type checking. The mypy compatibility is not enforced at the moment as pyre has been (and still is) used. We plan to gradually make the codebase compatible with mypy.

New Contributors

Full Changelog: v0.1.0...v0.1.1

v0.1.0

10 Jun 22:50
7e0f36f

Choose a tag to compare

What's Changed

  • [BC-breaking] Replace load_npz implementation by @mthrok in #739
  • [BC-breaking] Remove fallback for API transition by @mthrok in #770
  • [BC-breaking] Replace iterate_in_subprocess implementation by #776 @mthrok in #779
  • Add support for DEFLATE algorithm by @mthrok in #744
  • Name threads by @mthrok in #728
  • Update load_npy to accept memoryview by @mthrok in #738
  • Remove libzip dependency by @mthrok in #746
  • Default to the default context by @mthrok in #762
  • Revise the way periodic task is terminated by @mthrok in #768
  • Add the funtionality to suppress repeated errors in spdl by @gregorpm in #773
  • Add function to convert IterableWithShuffle to Iterable by @mthrok in #775, #777

Examples

  • Add example to benchmark IO functions by @mthrok in #730

Docs

Full Changelog: v0.0.14...v0.1.0

v0.0.14

12 May 15:32
b9a01c7

Choose a tag to compare

Documentation Update

  • Improved the documentation.
    The SPDL was started as a experiment for multi-threading, but now it's not just multi-threading but also multi-processing.
    It's more about production readiness, iterative optimization enabled by observability and flexibility.
    The Overview and Getting Started have been updated to reflect this
  • Optimization Guide
    https://facebookresearch.github.io/spdl/0.0.14/performance_analysis/index.html
    In the new Optimization Guide section, we share the methodologies we develop while optimizing various production pipelines.
    It illustrates how we estimate the data loading inefficiency, identify the bottleneck, and how we approach solving the bottleneck.

BC-breaking Changes

  • Propagate severe Pipeline errors to front-end by @mthrok in #699
    When Pipeline encounters an unexpected error, in addition to shutting down the pipeline,
    the error is now propagated to the foreground.

    The unexpected error includes an error occurred in source iterator.
    The errors happen in functions provided by users to Pipeline.pipe are expected, so they are not part of it.

Bugfixes

Other Changes

Full Changelog: v0.0.13...v0.0.14

v0.0.13 - Observability Improvment

02 May 14:32
fe75c39

Choose a tag to compare

Better observability towards bottleneck detection

In this release, we re-organized the pipeline hooks, and expose the periodic performance measurement mechanism as public interface.
This allows to tap into the performance statistics of data processing pipeline, which makes it easy to analyze the pipeline performance, and optimize the pipeline.

Please checkout the renewed Performance Analysis documentation, and example to learn how to log the runtime performance statistics.

BC-breaking Changes

New Features

Full Changelog: v0.0.12...v0.0.13

v0.0.12

23 Apr 15:40
a4fe619

Choose a tag to compare

New features

  • Improved streaming media processing capability, including multi-stream demuxing, remuxing, encoding and complex filter graph capability.
    • You can demux multiple streams from the source. (such as audio and video packets.)
    • With the new Muxer class, you can encode media or remux packets.
    • Please check out the streaming_video_processing.py for audio remuxing, streaming video decoding, and encoding.
  • The new FilterGraph class can perform complex filtering, multiple-input, multiple-output and multi-media processing. One popular example is overlay to add watermark on videos processed by AI. Please checkout the documentation.

What's Changed

Full Changelog: v0.0.11...v0.0.12