[Prototype] Simplified PipelineRL #428

jlamypoirier · 2025-12-17T18:10:41Z

✨ Description

…M into denis/new_datasets

…enis/new_datasets

bigximik · 2026-01-12T08:02:02Z

fast_llm/data/dataset/streaming.py

+                    assert stream_key == REDIS_DATA_STREAM.encode()
+                    for msg_id, msg_data in msgs:
+                        processed += 1
+                        # TODO: or do it after processing all received messaged then count > 1?


We do not need this for now if we stick with consumer groups only. The producer can rely on the group lag to control the production rate and on last-delivered-id to safely trim the stream.

bigximik · 2026-01-12T08:17:53Z

fast_llm/engine/training/streaming.py

+                backend="nccl",
+                init_method=init_method,
+                world_size=config.broadcast.external_world_size + 1,
+                rank=0,


Maybe this should be configurable, since the external system may not treat us as rank 0.

We also have control over the external system, we can make it so. It's easier to sync program with hard-coded values than syncing config files.

fast_llm/engine/training/config.py

bigximik · 2026-01-12T08:31:04Z

tests/models/test_streaming.py

Maybe name it tests/models/test_streaming_training_callbacks.py or something similar?
Or perhaps it would be better under tests/engine/trainer/ or tests/trainer/, since this is testing trainer callbacks with a streaming dataset rather than model logic (with the exception of the tensor iterator).

This needs to be in test/models because it uses the model_configs machinery. I tend to prefer concise names, so test_streaming seems appropriate.

bigximik · 2026-01-12T13:30:05Z

tests/models/test_streaming.py

+    worker_resources: WorkerResources,
+    report_subtest,
+):
+    report_subtest(path := run_test_script_base_path / config.name, config.total_gpus)


Here we need to check whether we have enough GPUs for a subtest; otherwise, it will incorrectly report the test as failed with “did not run.”

bigximik · 2026-01-12T13:31:22Z

tests/data/test_streaming.py

+@pytest.mark.depends_on(on=["test_data_streaming"])
+@pytest.mark.parametrize(("name", "num_gpus", "distributed_config_dict"), _DISTRIBUTED_TESTING_CONFIGS)
+def test_data_streaming_distributed(result_path, name, num_gpus, distributed_config_dict, report_subtest):
+    report_subtest(path := result_path / f"data_streaming/{name}", num_gpus)


Here we need to check whether we have enough GPUs for a subtest; otherwise, it will incorrectly report the test as failed with “did not run.”

bigximik · 2026-01-12T13:54:49Z

fast_llm/data/dataset/config.py

+
+    _abstract = False
+
+    acknowledge_interval: int = Field(


This is also not needed if we only use consumer gorups, see above comment for implementation.

jlamypoirier and others added 30 commits October 14, 2025 22:52

Dataset interface

1a18929

misc

fd63846

fix

2486caf

Language model sample

92e93e8

fix

d6f6944

fixes

5c802fa

test

95d1840

fixes

eafd9cb

cleanup

c56df69

misc

7f437e1

misc

dfd27f5

Memmap dataset

90cd009

fixes

acfd30e

fixes

34939e9

int64

c5fa072

Test and fix preparator

cd28676

fix

435d214

fix

f6bef55

fix

e05d9a1

fix

9ba8d1b

fixes

b35b297

misc

abe2357

fix

1801d87

fix right stage mode

2223b85

newer transformers fixes

a9a4ace

fix distributed tests skip on single gpu

97f2b60

set mamba 2 style model conversions to broke

0fdc978

Merge branch 'jlp/dataset_interface' of github.com:ServiceNow/Fast-LL…

665deb5

…M into denis/new_datasets

Merge branch 'jlp/lm_sample' of github.com:ServiceNow/Fast-LLM into d…

4d03889

…enis/new_datasets

mmaba2 enable conversion tests

224c2ec

bigximik and others added 13 commits December 16, 2025 18:00

removed cloudpickle

359231f

Simplify distributed

ca9e94e

Simplified pipeline RL

9f0704c

stuff

992f447

misc

4144317

Merge remote-tracking branch 'origin/main' into jlp_pipeline_rl

6cf1e70

stuff

4d07494

fixes

c9d66dd

Merge remote-tracking branch 'origin/main' into jlp_pipeline_rl

b8b3d68

cleanup

a4c1aa5

stuff

b2c99d3

Merge branch 'jlp_subtest' into jlp_pipeline_rl

4ddacde

stuff

ea3fa20

jlamypoirier changed the base branch from main to jlp_subtest January 7, 2026 04:16

jlamypoirier added 2 commits January 6, 2026 23:37

stuff

e8f3873

Merge branch 'jlp_subtest' into jlp_pipeline_rl

07dce8d

jlamypoirier mentioned this pull request Jan 7, 2026

Distributed and testing tweaks #440

Merged

bigximik mentioned this pull request Jan 12, 2026

[Prototype] Integration with PipelineRL: streaming dataset and trainer events with weight broadcasting #389

Draft

25 tasks

bigximik reviewed Jan 12, 2026

View reviewed changes

fast_llm/engine/training/config.py Show resolved Hide resolved

bigximik reviewed Jan 12, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into jlp_subtest

4b3b8d6

Base automatically changed from jlp_subtest to main January 13, 2026 01:08

jlamypoirier added 3 commits January 12, 2026 20:08

Merge branch 'jlp_subtest' into jlp_pipeline_rl

b62813a

Merge remote-tracking branch 'origin/main' into jlp_pipeline_rl

f1ca739

comments

961cdb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Prototype] Simplified PipelineRL #428

[Prototype] Simplified PipelineRL #428

Uh oh!

jlamypoirier commented Dec 17, 2025

Uh oh!

bigximik Jan 12, 2026

Uh oh!

bigximik Jan 12, 2026

Uh oh!

jlamypoirier Jan 13, 2026

Uh oh!

Uh oh!

bigximik Jan 12, 2026 •

edited

Loading

Uh oh!

jlamypoirier Jan 13, 2026

Uh oh!

bigximik Jan 12, 2026

Uh oh!

bigximik Jan 12, 2026

Uh oh!

bigximik Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Prototype] Simplified PipelineRL #428

Are you sure you want to change the base?

[Prototype] Simplified PipelineRL #428

Uh oh!

Conversation

jlamypoirier commented Dec 17, 2025

✨ Description

Uh oh!

bigximik Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

bigximik Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bigximik Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

bigximik Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

bigximik Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

bigximik Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bigximik Jan 12, 2026 •

edited

Loading