Skip to content

Conversation

@junxzm1990
Copy link
Contributor

@junxzm1990 junxzm1990 commented Oct 29, 2025

Description

This PR implements the "common subexpression elimination" (CSE) transformation. This is the second PR on the stack to introduce CSE optimizations to Move.

Motivating Example:

1. fun test(data: S, a: u64, b: u64): u64 {
2.       if (data.x != 0) {
3.           a / data.x
4.       } else {
5.           data.x + 1
6.       }
7.   }

At the stackless bytecode level, data.x is translated into a seq of BorrowLoc + BorrowField + ReadRef instructions.
Without CSE, all occurance of data.x (line 2, line 3, line 5) will be translated into the seq above, despite data.x at line 3 and
line 5 share the same result of line 2 and the computations are not necessary.

CSE aims to eliminate such redundant computations by reusing the result of previous computations.
Specifically, in the example above, assuming the BorrowLoc + BorrowField + ReadRef sequence at line 2 is assigned to temp t1,
then the occurrences at line 3 and line 5 can both be replaced by t1, eliminating the redundant computations.
The optimized bytecode would look like:

 0: $t6 := borrow_local($t0)
 1: $t7 := borrow_field<0x8675::M::S>.x($t6)
 2: $t5 := read_ref($t7) // `data.x` at line 2 assigned to $t5
 3: $t8 := 0
 4: $t4 := !=($t5, $t8)
 5: if ($t4) goto 6 else goto 11
 6: label L0
 7: $t9 := move($t1)
 8: $t3 := /($t9, $t5) // line 3 reuses $t5
 9: label L2
10: return $t3
11: label L1
12: $t16 := 1
13: $t3 := +($t5, $t16) // line 5 reuses $t5
14: goto 9

============================ Implementation Details ============================

Step 1: Build the Control Flow Graph (CFG) and Domination Tree of a target function.

Step 2: Traverse the Domination Tree in preorder, and for each basic block, for each instruction:

  • If the instruction is PURE, canonicalize the expression represented by the instruction into an ExprKey structure
    • ExprKey contains the operation and its arguments, represented as ExpArg,
    • ExpArg can be either a constant, a variable (temp), or another ExprKey to nest expressions recursively
      • Motivation to nest expression: consider the expression ReadRef(BorrowField(BorrowLoc(x))), we want to
        represent it as a single expression rather than three separate ones, so that we can eliminate
        the entire sequence at once.
      • Conditions to nest t1 = Op1(t0); t2 = Op2(t1); as Op2(Op1(t0)):
        • The definition at Op1 is the only definition of of t1 that can reach the instruction of Op2
        • t1 is only used once and exactly by Op2.
      • For commutative operations, the arguments are sorted to get a canonical order
  • Why pre-order traversal: ensure that all dominating blocks have been processed before the dominated ones,
    hencing not missing opportunities for replacement

Step 3: Check if the ExprKey from Step 2 has been seen before in a dominating block.

Given a seen-before ExprKey (annotated as src_expr) for the current expression (annotated as dest_expr),
and assuming the two expressions have the following formats:

  • src_expr: (src_temp1, src_temp2, ...) = src_op(src_ope1, src_ope2, ...) defined at src_inst, where src_ope1 and src_ope2 can be nested expressions.
  • dest_expr: (dest_temp1, dest_temp2, ...) = dest_op(dest_ope1, dest_ope2, ...) defined at dest_inst, where dest_ope1 and dest_ope2 can be nested expressions.

we take a set of conservative conditions to check safety of the replacement:

  • Condition 1. src_expr dominates dest_expr

    • This ensures that src_expr is always executed before dest_expr
  • Condition 2: type safety

    • src_temps and dest_temps share the same types
      • Otherwise, we may encounter type conflict when copying src_temp to dest_temp
    • stc_temp is not mutably borrowed
      • Otherwise, we may create a conflicting use while src_temp is mutably borrowed
  • Condition 3: src_temps are copyable

    • This ensures that copying src_temps to dest_temps does not violate ability constraints
  • Condition 4: src_temps at src_expr are the only definitions of src_temps that can reach dest_expr:

    • This ensures that we are not copying wrong values to dest_temps
  • Condition 5: Resources used in src_expr are not changed at dest_expr:

    • This ensures that BorrowGlobal and Exists operations are safe to reuse at dest_expr
    • This only applies when BorrowGlobal and Exists are involved in src_expr and dest_expr
  • Condition 6: Operands used in src_expr are safe to reuse at dest_expr:

    • Operands used in src_expr are identical to those used in dest_expr
    • None of the operands used in src_expr are possibly re-defined in a path between src_expr and dest_expr (without going through src_expr again)
      • This ensures that the values of the operands used in src_inst remain unchanged when reaching dest_inst
    • None of the operands used in src_expr are mutably borrowed elsewhere
      • This ensures that we are not creating conflicting uses while the operands are mutably borrowed
  • Condition 7: The replacement will bring performance gains! See comments above gain_perf for details

Step 4: for each src_expr passing the conditions to replace dest_expr in Step 3, we check gather necessary information to perform replacement like below:

Example:

1. src_temp = pure_computation_1(t0)      // src_inst
2. ...
3. use(src_temp)
4. dest_temp = pure_computation_1(t0)      // dest_inst
5. ...
6. use(dest_temp)

==>

1. src_temp = pure_computation_1(t0)      // src_inst
2. ...
3. use(src_temp)
4. dest_temp = copy(src_temp)      // inserted copy
5. ...
6. use(dest_temp)

Step 5: After processing all basic blocks, we perform the recorded replacements and eliminate the marked code.

============================ Extensions ============================

In principle, the algorithm above is designed to handle PURE instructions, defined as blow

  • the results only depend on the operands
  • has no side effects on memory (including write via references), control flow (including abort), or external state (global storage)
  • recomputing it multiple times yields no semantic effect.

Yet, we found that some non-pure instructions can be safely handled under certain conditions.

Group 1: operations that are pure if no arithmetic errors like overflows happen (+, -, *, /, %, etc):

  • such operations are dealt as pure in aggressive mode
  • their side effects are safe because, if those happen, they are guaranteed to happen earlier in the src_inst

Group 2: operations that are pure if no type errors happen (UnpackVariant):

  • such operations are dealt as pure in aggressive mode
  • their side effects are safe because, if those happen, they are guaranteed to happen earlier in the src_inst

Group 3: BorrowLoc, BorrowField, BorrowVariantField

  • In principle, borrow operations are not pure as they depend on memory states.
  • However, if we guarantee that their operands are not references or constants and are not changed between src_inst and dst_inst, we can treat them as pure.
    • Note: by operands, we mean the most deeply nested operands, e.g., in BorrowField(BorrowLoc(x)), x is the operand for BorrowField.

Group 4: Assign

  • It can be treated as pure when the assign kind is Copy or Inferred

Group 5: readref

  • In principle, readref is not pure as it depends on memory states.
  • However, if we guarantee that their operands are not references and are not changed between src_inst and dst_inst, we can treat them as pure.
    • Note: by operands, we mean the most deeply nested operands, e.g., in ReadRef(BorrowField(BorrowLoc(x))), x is the operand for ReadRef.

Group 6: Function calls

  • A function call can be treated as pure if the callee
    • Does not modify any memory via mutable references
    • Does not access global resources

Group 7: BorrowGlobal and Exists

  • They can be treated as pure if we guarantee that the resources involved are not modified between src_inst and dst_inst

All TODO items are marked with TODO(#18203).

How Has This Been Tested?

  • Existing compiler tests and transactional tests
  • New test cases will be added in next PR

Expected Result Changes

  • Expensive recomputation is elimintated with reuse of the result from the first computation.
  • Intermediate analysis results introduced by CSE

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (Move linter)

Copy link
Contributor Author

junxzm1990 commented Oct 29, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@junxzm1990 junxzm1990 marked this pull request as ready for review October 29, 2025 15:46
@junxzm1990 junxzm1990 marked this pull request as draft October 29, 2025 15:47
@junxzm1990 junxzm1990 force-pushed the jun/cse-opt branch 10 times, most recently from b6f8d5d to 97a2b06 Compare November 12, 2025 20:28
@junxzm1990 junxzm1990 changed the base branch from main to graphite-base/17989 November 24, 2025 16:13
@junxzm1990 junxzm1990 changed the base branch from graphite-base/17989 to jun/reach-def November 24, 2025 16:13
@junxzm1990 junxzm1990 force-pushed the jun/cse-opt branch 2 times, most recently from 8b88a87 to 3066301 Compare November 24, 2025 17:04
@junxzm1990 junxzm1990 force-pushed the jun/reach-def branch 2 times, most recently from a6827da to 69c4e71 Compare November 24, 2025 20:56
@junxzm1990 junxzm1990 force-pushed the jun/cse-opt branch 3 times, most recently from 894fa74 to 4e29fd9 Compare November 25, 2025 03:40
@junxzm1990 junxzm1990 force-pushed the jun/cse-opt branch 3 times, most recently from 4035dbb to 545e1a4 Compare November 25, 2025 16:23
@junxzm1990 junxzm1990 changed the title [move compiler] common subexpression elimination [move compiler] [CSE Step 2] common subexpression elimination Nov 25, 2025
@junxzm1990 junxzm1990 marked this pull request as ready for review November 25, 2025 16:33
@junxzm1990 junxzm1990 force-pushed the jun/cse-opt branch 6 times, most recently from 6dd4fc1 to 285eb08 Compare November 25, 2025 20:10
@junxzm1990 junxzm1990 force-pushed the jun/cse-opt branch 4 times, most recently from 8bbbf38 to 4f00c2e Compare November 26, 2025 01:01
@junxzm1990 junxzm1990 force-pushed the jun/cse-opt branch 5 times, most recently from 5cd8843 to 9e565bb Compare November 27, 2025 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants