Skip to content

Commit 3ab7a4f

Browse files
authored
docs(streaming): update streaming configuration documentation (#1542)
* docs(streaming): update streaming configuration documentation feature: #1538 Revise streaming docs to clarify usage of stream_async(), remove outdated global streaming config, and add CLI usage instructions. Explain output rails streaming requirements and deprecation of StreamingHandler. Improve examples and guidance for token usage tracking. * not sure about these 2
1 parent 5942a94 commit 3ab7a4f

File tree

3 files changed

+38
-92
lines changed

3 files changed

+38
-92
lines changed

docs/configure-rails/yaml-schema/streaming/global-streaming.md

Lines changed: 31 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,33 @@
11
---
2-
title: Global Streaming
3-
description: Enable streaming mode for LLM token generation in config.yml.
2+
title: Streaming
3+
description: Using streaming mode for LLM token generation in NeMo Guardrails.
44
---
55

6-
# Global Streaming
6+
# Streaming
77

8-
Enable streaming mode for the main LLM generation at the top level of `config.yml`.
8+
NeMo Guardrails supports streaming LLM responses via the `stream_async()` method. No configuration is required to enable streaming—simply use `stream_async()` instead of `generate_async()`.
99

10-
## Configuration
10+
## Basic Usage
1111

12-
```yaml
13-
streaming: True
14-
```
15-
16-
## What It Does
17-
18-
When enabled, global streaming:
12+
```python
13+
from nemoguardrails import LLMRails, RailsConfig
1914

20-
- Sets `streaming = True` on the underlying LLM model
21-
- Enables `stream_usage = True` for token usage tracking
22-
- Allows using the `stream_async()` method on `LLMRails`
23-
- Makes the LLM produce tokens incrementally instead of all at once
15+
config = RailsConfig.from_path("./config")
16+
rails = LLMRails(config)
2417

25-
## Default
18+
messages = [{"role": "user", "content": "Hello!"}]
2619

27-
`False`
20+
async for chunk in rails.stream_async(messages=messages):
21+
print(chunk, end="", flush=True)
22+
```
2823

2924
---
3025

31-
## When to Use
32-
33-
### Streaming Without Output Rails
34-
35-
If you do not have output rails configured, only global streaming is needed:
36-
37-
```yaml
38-
streaming: True
39-
```
40-
41-
### Streaming With Output Rails
26+
## Streaming With Output Rails
4227

43-
When using output rails with streaming, you must also configure [output rail streaming](output-rail-streaming.md):
28+
When using output rails with streaming, you must configure [output rail streaming](output-rail-streaming.md):
4429

4530
```yaml
46-
streaming: True
47-
4831
rails:
4932
output:
5033
flows:
@@ -53,27 +36,15 @@ rails:
5336
enabled: True
5437
```
5538
56-
---
39+
If output rails are configured but `rails.output.streaming.enabled` is not set to `True`, calling `stream_async()` will raise an `StreamingNotSupportedError`.
5740

58-
## Python API Usage
41+
---
5942

60-
### Simple Streaming
43+
## Streaming With Handler (Deprecated)
6144

62-
```python
63-
from nemoguardrails import LLMRails, RailsConfig
64-
65-
config = RailsConfig.from_path("./config")
66-
rails = LLMRails(config)
45+
> **Warning:** Using `StreamingHandler` directly is deprecated and will be removed in a future release. Use `stream_async()` instead.
6746

68-
messages = [{"role": "user", "content": "Hello!"}]
69-
70-
async for chunk in rails.stream_async(messages=messages):
71-
print(chunk, end="", flush=True)
72-
```
73-
74-
### Streaming With Handler
75-
76-
For more control, use a `StreamingHandler`:
47+
For advanced use cases requiring more control over token processing, you can use a `StreamingHandler` with `generate_async()`:
7748

7849
```python
7950
from nemoguardrails import LLMRails, RailsConfig
@@ -113,9 +84,19 @@ Enable streaming in the request body by setting `stream` to `true`:
11384

11485
---
11586

87+
## CLI Usage
88+
89+
Use the `--streaming` flag with the chat command:
90+
91+
```bash
92+
nemoguardrails chat path/to/config --streaming
93+
```
94+
95+
---
96+
11697
## Token Usage Tracking
11798

118-
When streaming is enabled, NeMo Guardrails automatically enables token usage tracking by setting `stream_usage = True` for the underlying LLM model.
99+
When using `stream_async()`, NeMo Guardrails automatically enables token usage tracking by setting `stream_usage = True` on the underlying LLM model.
119100

120101
Access token usage through the `log` generation option:
121102

docs/configure-rails/yaml-schema/streaming/index.md

Lines changed: 5 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,23 @@
11
---
22
title: Streaming Configuration
3-
description: Configure streaming for LLM token generation and output rail processing in config.yml.
3+
description: Configure streaming for output rail processing in config.yml.
44
---
55

66
# Streaming Configuration
77

8-
NeMo Guardrails supports two levels of streaming configuration:
8+
NeMo Guardrails supports streaming out of the box when using the `stream_async()` method. No configuration is required to enable basic streaming.
99

10-
1. **Global streaming** - Controls LLM token generation
11-
2. **Output rail streaming** - Controls how output rails process streamed tokens
12-
13-
## Configuration Comparison
14-
15-
| Aspect | Global `streaming` | Output Rail `streaming.enabled` |
16-
|--------|-------------------|--------------------------------|
17-
| **Scope** | LLM token generation | Output rail processing |
18-
| **Required for** | Any streaming | Streaming with output rails |
19-
| **Affects** | How LLM produces tokens | How rails process token chunks |
20-
| **Default** | `False` | `False` |
10+
When you have **output rails** configured, you need to explicitly enable streaming for them to process tokens in chunked mode.
2111

2212
## Quick Example
2313

24-
When using streaming with output rails, both configurations are required:
14+
When using streaming with output rails:
2515

2616
```yaml
27-
# Global: Enable LLM streaming
28-
streaming: True
29-
3017
rails:
3118
output:
3219
flows:
3320
- self check output
34-
# Output rail streaming: Enable chunked processing
3521
streaming:
3622
enabled: True
3723
chunk_size: 200
@@ -40,18 +26,11 @@ rails:
4026
4127
## Streaming Configuration Details
4228
43-
The following guides provide detailed documentation for each streaming configuration area.
29+
The following guides provide detailed documentation for streaming configuration.
4430
4531
::::{grid} 1 1 2 2
4632
:gutter: 3
4733
48-
:::{grid-item-card} Global Streaming
49-
:link: global-streaming
50-
:link-type: doc
51-
52-
Enable streaming mode for LLM token generation in config.yml.
53-
:::
54-
5534
:::{grid-item-card} Output Rail Streaming
5635
:link: output-rail-streaming
5736
:link-type: doc

docs/run-rails/streaming.md

Lines changed: 2 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,12 @@
11
# Streaming
22

3-
If the application LLM supports streaming, you can configure NeMo Guardrails to stream tokens as well.
3+
If the application LLM supports streaming, NeMo Guardrails can stream tokens as well. Streaming is automatically enabled when you use the `stream_async()` method - no configuration is required.
44

55
For information about configuring streaming with output guardrails, refer to the following:
66

77
- For configuration, refer to [streaming output configuration](../user-guides/configuration-guide.md#streaming-output-configuration).
88
- For sample Python client code, refer to [streaming output](../getting-started/5-output-rails/README.md#streaming-output).
99

10-
## Configuration
11-
12-
To activate streaming on a guardrails configuration, add the following to your `config.yml`:
13-
14-
```yaml
15-
streaming: True
16-
```
17-
1810
## Usage
1911

2012
### Chat CLI
@@ -215,13 +207,7 @@ POST /v1/chat/completions
215207
We also support streaming for LLMs deployed using `HuggingFacePipeline`.
216208
One example is provided in the [HF Pipeline Dolly](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/examples/configs/llm/hf_pipeline_dolly/README.md) configuration.
217209

218-
To use streaming for HF Pipeline LLMs, you first need to set the streaming flag in your `config.yml`.
219-
220-
```yaml
221-
streaming: True
222-
```
223-
224-
Then you need to create an `nemoguardrails.llm.providers.huggingface.AsyncTextIteratorStreamer` streamer object,
210+
To use streaming for HF Pipeline LLMs, you need to create an `nemoguardrails.llm.providers.huggingface.AsyncTextIteratorStreamer` streamer object,
225211
add it to the `kwargs` of the pipeline and to the `model_kwargs` of the `HuggingFacePipelineCompatible` object.
226212

227213
```python

0 commit comments

Comments
 (0)