Skip to content

Conversation

@tscholak
Copy link
Collaborator

Summary

  • Adds vLLM-optimized Apriel2 model implementation to fast_llm_external_models
  • Uses vLLM's ModelRegistry.register_model() for runtime registration (no vLLM patching required)
  • Supports hybrid architectures: attention, mamba, GDN, and KDA mixers

Attribution

Model implementation based on work by @nandahkrishna from the apriel2-vllm branch. This PR adapts that implementation for plugin-based registration as an alternative to patching vLLM directly.

Goal

Evaluate whether vLLM's plugin/registration mechanism can work for us as a short-term solution, avoiding the need to maintain a patched vLLM fork.

Usage

from fast_llm_external_models.apriel2.vllm import register
from vllm import LLM

register()
llm = LLM(model="path/to/apriel2/checkpoint")

Test plan

  • Test registration mechanism with vLLM
  • Verify model loads correctly
  • Compare inference results with patched vLLM approach

🤖 Generated with Claude Code

tscholak and others added 2 commits January 10, 2026 12:38
- Add README.md documenting the algebraic structure of the conversion system
  (surgery monoid, action law, plan composition, total vs partial operations)
- Add prune_supernet_step1.yaml and prune_supernet_step2.yaml examples
  demonstrating the two-step workflow for pruning a homogeneous supernet
  to a heterogeneous network with different mixer types per layer

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add modeling_apriel2.py with full vLLM-optimized implementation
  supporting attention, mamba, GDN, and KDA mixer types
- Add register() function for runtime model registration via
  vLLM's ModelRegistry (no patching required)
- Based on Nanda's vllm_diff.patch, adapted for external package use

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants