This tutorial provides methods to verify that FlashRL is correctly installed and functioning in your environment.
git clone --depth 1 --branch v0.4.8 https://github.com/EleutherAI/lm-evaluation-harness && pip install -e lm-evaluation-harnessThis section contains two tests to verify that both core FlashRL patches are working correctly.
Purpose: This test verifies that FlashRL can successfully override vLLM's initialization process.
How it works: FlashRL forces vLLM to load LiyuanLucasLiu/Qwen2.5-0.5B-Instruct-VERL in FP8 quantization instead of Qwen/Qwen2.5-0.5B-Instruct in BF16. Since the VERL model has slightly better performance than the base model, successful patching should show improved results.
Expected Results:
- Target Performance: ~0.36 (flexible-extract), meaning the patch succeeds
- Alternative Performance: ~0.34 (flexible-extract), meaning the patch fails
Commands:
flashrl setup --fn fp8 --model LiyuanLucasLiu/Qwen2.5-0.5B-Instruct-VERL -o $HOME/.flashrl_config.yaml
# Run lm-eval with FlashRL overriding, using FP8 quantization
FLASHRL_CONFIG=$HOME/.flashrl_config.yaml \
FLASHRL_LOGGING_LEVEL=DEBUG \
VLLM_LOGGING_LEVEL=DEBUG \
CUDA_VISIBLE_DEVICES=0 \
torchrun --standalone --nnodes=1 --nproc-per-node=1 --no-python lm_eval --model vllm \
--model_args pretrained=Qwen/Qwen2.5-0.5B-Instruct,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.8,data_parallel_size=1 \
--tasks gsm8k \
--batch_size autoExpected Output:
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.3632 | ± | 0.0132 |
Purpose: This test verifies that FlashRL can successfully handle multiple weight loading operations.
How it works: FlashRL instructs vLLM to call load_weights twice at the end of its initialization:
- First load:
Qwen/Qwen2.5-0.5B-Instruct(baseline model) - Second load:
LiyuanLucasLiu/Qwen2.5-0.5B-Instruct-VERL(improved model)
Both models are loaded in FP8 quantization. Since the VERL model performs better, seeing the improved performance indicates successful patching.
Expected Results:
- Target Performance: ~0.36 (flexible-extract), meaning the patch succeeds
- Alternative Performance: ~0.34 (flexible-extract), meaning the patch fails
Commands:
# Run lm-eval with FlashRL overriding, testing dual weight loading with FP8 quantization
FLASHRL_TEST_RELOAD="Qwen/Qwen2.5-0.5B-Instruct,LiyuanLucasLiu/Qwen2.5-0.5B-Instruct-VERL" \
FLASHRL_CONFIG="fp8" \
FLASHRL_LOGGING_LEVEL=DEBUG \
VLLM_LOGGING_LEVEL=DEBUG \
CUDA_VISIBLE_DEVICES=0 \
torchrun --standalone --nnodes=1 --nproc-per-node=1 --no-python lm_eval --model vllm \
--model_args pretrained=Qwen/Qwen2.5-0.5B-Instruct,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.8,data_parallel_size=1 \
--tasks gsm8k \
--batch_size autoExpected Output:
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.3632 | ± | 0.0132 |
If the resulting score obtained is neither 0.36 nor 0.34, please run the following score to obtain your reference score.
To obtain the result for LiyuanLucasLiu/Qwen2.5-0.5B-Instruct-VERL for fp8 (referred to as score-a):
lm_eval --model vllm \
--model_args pretrained=LiyuanLucasLiu/Qwen2.5-0.5B-Instruct-VERL,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.8,data_parallel_size=1,quantization=fp8 \
--tasks gsm8k \
--batch_size auto
To obtain the result for Qwen/Qwen2.5-0.5B-Instruct under bf16 (refereed as score-b):
lm_eval --model vllm \
--model_args pretrained=Qwen/Qwen2.5-0.5B-Instruct,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.8,data_parallel_size=1 \
--tasks gsm8k \
--batch_size auto
Then, the results from previous tests can be interpreted as:
- Performance ~0.36 (or
score-a): FlashRL patches are working correctly - Performance ~0.34 (or
score-b): FlashRL patches may not be functioning; the system is likely using the baseline model
If you don't see the expected performance improvements:
- Check Installation: Verify FlashRL is properly installed
- Review Logs: Examine DEBUG logs for error messages