Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
daf20d9
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 6, 2026
6f1e63c
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 6, 2026
4deb7a7
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 9, 2026
676daf6
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 9, 2026
9bcfdca
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 10, 2026
2bfa878
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 10, 2026
262c470
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 11, 2026
171b4d3
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 17, 2026
def0bd2
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 19, 2026
4fad5dc
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 20, 2026
3d739a6
Port ngram_match and hybrid_mtp_ngram kernels to CUDA
cloudforge1 Mar 20, 2026
477f749
Add correctness + latency test for GPU ngram kernels
cloudforge1 Mar 20, 2026
c349b12
Fix test data: step_idx semantics and ngram-matchable patterns
cloudforge1 Mar 20, 2026
217e587
fix: add CPU fallback path for ngram_match and hybrid_mtp_ngram ops
cloudforge1 Mar 21, 2026
08fe00a
fix(test): wrap imported ops with staticmethod to prevent self-binding
cloudforge1 Mar 21, 2026
305868d
fix(test): ensure max_model_len >= input_len to prevent broadcast err…
cloudforge1 Mar 21, 2026
1dfaed5
fix: keep input_ids_len on CPU in __init__, move to GPU in _run_impl
cloudforge1 Mar 22, 2026
b7f1f38
Extract shared ngram search into __device__ helper (ngram_match_commo…
cloudforge1 Mar 25, 2026
3f71877
refactor: parallel CUDA kernels for ngram_match (<<<bsz,256>>> search)
cloudforge1 Mar 30, 2026
838d6dc
fix: move __global__ kernel defs from .cuh to .cu files (fix linker m…
cloudforge1 Mar 30, 2026
f45e39b
fix: align mixed kernel signatures with host function tensors
cloudforge1 Mar 30, 2026
a7f149a
fix: address review — GPU mirror for input_ids_len, device mismatch i…
cloudforge1 Apr 2, 2026
65f609b
bench: add 5-group benchmark matching NKNaN methodology
cloudforge1 Apr 2, 2026
c6e698f
fix: rename benchmark for CI discovery, bump to 10k iterations
cloudforge1 Apr 2, 2026
3ddae7f
fix: correct stale filename in benchmark docstring
cloudforge1 Apr 2, 2026
2e34b72
bench: remove env-gate from benchmark groups, cut NUM_ITERS to 1000
cloudforge1 Apr 3, 2026
261903f
fix: explicit GPU placement for input_ids_len_gpu, CPUPlace in benchmark
cloudforge1 Apr 3, 2026
00885dd
perf: O(bsz) running sum in Phase 2 gather kernels + defensive guard
cloudforge1 Apr 3, 2026
cf6a664
fix: remove unused draft_token_num param, clarify CAS + encoder accum…
cloudforge1 Apr 3, 2026
cac9ce3
perf: BlockScan Phase 2 + A1 (1024 threads) + A2 (early-exit)
cloudforge1 Apr 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
389 changes: 330 additions & 59 deletions custom_ops/gpu_ops/speculate_decoding/draft_model/ngram_match_mixed.cu

Large diffs are not rendered by default.

227 changes: 0 additions & 227 deletions custom_ops/gpu_ops/speculate_decoding/ngram_match.cc

This file was deleted.

Loading
Loading