From page 4 of the [Kimi K2 technical report](https://arxiv.org/pdf/2507.20534), enabling Muon to scale to larger transformers and reduce loss spikes: <img width="808" height="410" alt="Image" src="https://github.com/user-attachments/assets/57dad50d-1460-40e1-94e4-a97a9e63bcc8" />
From page 4 of the Kimi K2 technical report, enabling Muon to scale to larger transformers and reduce loss spikes: