Skip to content

Conversation

@bubblepipe
Copy link

This PR optimizes CKB-VM's x86-64 assembly implementation of division and remainder instructions by replacing stack operations (PUSH/POP) with register moves.

This optimization is not visible in the built-in cargo bench due to measurement noise, but static analysis with OSACA indicates 10x reduction in loop-carried dependency, and profiling on an AMD 5950x using uProf reports 95% less memory subsystem workload.

@mohanson mohanson requested a review from Copilot November 3, 2025 05:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes division and remainder operations in the x64 assembly JIT by replacing stack-based register preservation with register-to-register moves. The changes eliminate push/pop operations in favor of using TEMP3 (%r11) as a scratch register to temporarily store the RD register (%rax).

  • Replaced push RD / pop RD with movq RD, TEMP3 / movq TEMP3, RD across all division and remainder operations
  • Removed redundant PUSH_RD_IF_RAX / PUSH_RD_IF_RDX / POP_RD_IF_RAX / POP_RD_IF_RDX macro calls in division operations
  • Applied changes consistently to DIV, DIVU, DIVUW, DIVW, REM, REMU, REMUW, REMW, WIDE_DIV, and WIDE_DIVU operations
Comments suppressed due to low confidence (6)

src/machine/asm/execute_x64.S:900

  • The MULH operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros for register preservation while division/remainder operations have been optimized to use TEMP3. For consistency and performance, consider replacing these with movq RD, TEMP3 before line 896 and movq TEMP3, RD after line 898, similar to the division operations.
  PUSH_RD_IF_RAX
  PUSH_RD_IF_RDX
  movq REGISTER_ADDRESS(RS1), %rax
  imulq REGISTER_ADDRESS(RS2r)
  MOV_RDX_TO_RS2r
  POP_RD_IF_RDX
  POP_RD_IF_RAX

src/machine/asm/execute_x64.S:907

  • The MULHSU operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros. For consistency with the division/remainder optimizations in this PR, consider replacing these with movq RD, TEMP3 and the corresponding movq TEMP3, RD restoration.
  PUSH_RD_IF_RAX
  PUSH_RD_IF_RDX

src/machine/asm/execute_x64.S:942

  • The MULHU operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros. For consistency with the division/remainder optimizations in this PR, consider replacing these with movq RD, TEMP3 and the corresponding movq TEMP3, RD restoration.
  PUSH_RD_IF_RAX
  PUSH_RD_IF_RDX

src/machine/asm/execute_x64.S:2170

  • The WIDE_MUL operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros. For consistency with the division/remainder optimizations in this PR, consider replacing these with movq RD, TEMP3 and the corresponding movq TEMP3, RD restoration.
  PUSH_RD_IF_RAX
  PUSH_RD_IF_RDX

src/machine/asm/execute_x64.S:2184

  • The WIDE_MULU operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros. For consistency with the division/remainder optimizations in this PR, consider replacing these with movq RD, TEMP3 and the corresponding movq TEMP3, RD restoration.
  PUSH_RD_IF_RAX
  PUSH_RD_IF_RDX

src/machine/asm/execute_x64.S:2198

  • The WIDE_MULSU operation still uses PUSH_RD_IF_RAX/POP_RD_IF_RAX macros. For consistency with the division/remainder optimizations in this PR, consider replacing these with movq RD, TEMP3 and the corresponding movq TEMP3, RD restoration.
  PUSH_RD_IF_RAX
  PUSH_RD_IF_RDX

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant