Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions python/sglang/srt/layers/rotary_embedding.py
Original file line number Diff line number Diff line change
Expand Up @@ -1481,6 +1481,8 @@ def _forward_triton(
num_tokens = positions.shape[-1]
cos_sin = self.cos_sin_cache[positions]
cos, sin = cos_sin.chunk(2, dim=-1)
cos = cos.contiguous()
sin = sin.contiguous()
Comment on lines +1484 to +1485
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is a good optimization to move the .contiguous() calls into the torch.compile region. To make this optimization more complete, you could apply the same logic to the query and key tensors.

Currently, triton_mrope also calls .contiguous() on q and k outside the compiled region. Moving these calls for query and key into _forward_triton should provide further performance benefits. I recommend adding query = query.contiguous() and key = key.contiguous() within the if positions.ndim == 2: block, just before the call to triton_mrope_wrapper.

After ensuring query, key, cos, and sin are all contiguous within _forward_triton, the corresponding .contiguous() calls on lines 1277-1280 in the triton_mrope function can be removed to improve code clarity.

query_shape = query.shape
key_shape = key.shape
if positions.ndim == 2:
Expand Down
Loading