Skip to content

Conversation

@jchen10
Copy link
Contributor

@jchen10 jchen10 commented Nov 18, 2025

Implement lazy weight layout transformation for WebGPU Conv kernel to avoid redundant GPU transposes on every inference.

Key changes:

  • Add WeightLayoutTransformCache to cache transformed weights by name and format
  • Implement TransformWeightLayout() helper using existing TransposeKernel for OIHW->HWIO transformation
  • Cache stored in WebGpuExecutionProvider, shared across all kernels

Implement lazy weight layout transformation for WebGPU Conv kernel to
avoid redundant GPU transposes on every inference.

Key changes:
- Add WeightLayoutTransformCache to cache transformed weights by name
and format
- Implement TransformWeightLayout() helper using existing
TransposeKernel for OIHW->HWIO transformation
- Cache stored in WebGpuExecutionProvider, shared across all kernels
@jchen10
Copy link
Contributor Author

jchen10 commented Nov 18, 2025

Follow-up for #26554

@jchen10
Copy link
Contributor Author

jchen10 commented Nov 18, 2025

@fs-eire PTAL

@jchen10
Copy link
Contributor Author

jchen10 commented Nov 19, 2025

I am still looking into the PrePack approach, which seems more appealing as it does release the original tensors.

@fs-eire
Copy link
Contributor

fs-eire commented Nov 19, 2025

I am still looking into the PrePack approach, which seems more appealing as it does release the original tensors.

Please take a look at #26602. However I didn't finish all validation yet.

@jchen10
Copy link
Contributor Author

jchen10 commented Nov 19, 2025

I am still looking into the PrePack approach, which seems more appealing as it does release the original tensors.

Please take a look at #26602. However I didn't finish all validation yet.

Great. That's exactly what I wanted. One pity of PrePack is that we couldn't know the runtime input/output shapes which may impact how we choose the optimal blocked format for weight. Let's see if this issue will come up in the future. So far so good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants