Skip to content

Process hangs up after installing flash-llm-rl #22

@zzfoutofspace

Description

@zzfoutofspace

Hello, I successfully installed flash-rl following your instructions, and finished the test to ensure the correctness.

Then I directly ran the script you provided
bash recipe/flash_rl/gsm8k_qwen0_5b_int8.sh flash-int8-TIS-2 2

The process hangs up after the ray instance is started:

2025-08-22 10:54:24,273 - numexpr.utils - INFO - Note: detected 384 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2025-08-22 10:54:24,273 - numexpr.utils - INFO - Note: NumExpr detected 384 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
2025-08-22 10:54:24,273 - numexpr.utils - INFO - NumExpr defaulting to 16 threads.
DEBUG 08-22 10:54:24 [init.py:28] No plugins for group vllm.platform_plugins found.
DEBUG 08-22 10:54:24 [init.py:34] Checking if TPU platform is available.
DEBUG 08-22 10:54:24 [init.py:44] TPU platform is not available because: No module named 'libtpu'
DEBUG 08-22 10:54:24 [init.py:52] Checking if CUDA platform is available.
DEBUG 08-22 10:54:24 [init.py:72] Confirmed CUDA platform is available.
DEBUG 08-22 10:54:24 [init.py:100] Checking if ROCm platform is available.
DEBUG 08-22 10:54:24 [init.py:114] ROCm platform is not available because: No module named 'amdsmi'
DEBUG 08-22 10:54:24 [init.py:122] Checking if HPU platform is available.
DEBUG 08-22 10:54:24 [init.py:129] HPU platform is not available because habana_frameworks is not found.
DEBUG 08-22 10:54:24 [init.py:140] Checking if XPU platform is available.
DEBUG 08-22 10:54:24 [init.py:150] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
DEBUG 08-22 10:54:24 [init.py:158] Checking if CPU platform is available.
DEBUG 08-22 10:54:24 [init.py:180] Checking if Neuron platform is available.
DEBUG 08-22 10:54:24 [init.py:187] Neuron platform is not available because: No module named 'transformers_neuronx'
DEBUG 08-22 10:54:24 [init.py:52] Checking if CUDA platform is available.
DEBUG 08-22 10:54:24 [init.py:72] Confirmed CUDA platform is available.
INFO 08-22 10:54:24 [init.py:239] Automatically detected platform cuda.
2025-08-22 10:54:26,762 - flash_rl.vllm_patch - DEBUG - Successfully patched the process_weights_after_loading function of vllm
2025-08-22 10:54:26,762 - flash_rl.vllm_patch - DEBUG - Successfully patched kv_cache process_weights_after_loading
2025-08-22 10:54:26,762 - flash_rl - DEBUG - Patching vllm process_weights_after_loading... status: True
2025-08-22 10:54:26,762 - flash_rl.vllm_patch - DEBUG - Successfully patched vllm
2025-08-22 10:54:26,762 - flash_rl - DEBUG - Patching the vllm LLM to enable flash_rl quantization... status: True
2025-08-22 10:54:27,353 - hydra.core.utils - DEBUG - Setting JobRuntime:name=UNKNOWN_NAME
2025-08-22 10:54:27,353 - hydra.core.utils - DEBUG - Setting JobRuntime:name=main_ppo
2025-08-22 10:54:34,306 INFO worker.py:1832 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265

The experiments is conducted on 8 GPUs on one node, with vllm==0.8.4.

Is there any idea why the process keeps hanging up?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions