Skip to content

Conversation

@mafeguimaraes
Copy link
Contributor

This patch introduces the HWVectorization pass, which identifies bitwise patterns in hardware modules that can be represented as vectorized operations instead of per-bit logic.
The pass aims to simplify the IR by grouping related scalar bit operations (such as comb.extract and comb.concat) into higher-level vector constructs like comb.reverse, comb.replicate, or direct multi-bit comb.and, comb.or, and comb.xor.

The pass scans each hw.module and identifies groups of bit-level operations that can be merged into vector-level constructs. This version supports several key patterns based on bit-level dataflow analysis and structural analysis.

This patch was co-authored by @RosaUlisses.

Supported transformations include:

1. Linear concatenations (identity):

  • Pattern: Bits are extracted in ascending order (identity permutation) and concatenated.

  • Transformation: The entire comb.concat chain is replaced with the original input vector.

// Before
%0 = comb.extract %in from 0 : (i4) -> i1
%1 = comb.extract %in from 1 : (i4) -> i1
%2 = comb.extract %in from 2 : (i4) -> i1
%3 = comb.extract %in from 3 : (i4) -> i1
%concat = comb.concat %3, %2, %1, %0 : i1, i1, i1, i1
hw.output %concat : i4

// After
hw.output %in : i4

2. Bit reversal:

  • Pattern: Bits are extracted in descending (reverse) order and concatenated.

  • Transformation: The chain is replaced with a single comb.reverse.

// Before
%0 = comb.extract %in from 0 : (i4) -> i1
%1 = comb.extract %in from 1 : (i4) -> i1
%2 = comb.extract %in from 2 : (i4) -> i1
%3 = comb.extract %in from 3 : (i4) -> i1
%rev = comb.concat %0, %1, %2, %3 : i1, i1, i1, i1
hw.output %rev : i4

// After
%0 = comb.reverse %in : i4
hw.output %0 : i4

3. Structural Patterns (e.g., Vectorized Mux)

  • Pattern: Isomorphic, bit-parallel logic cones are detected. For example, a scalarized mux structure that uses a replicated i1 control signal for each bit.

  • Transformation: The replicated scalar operations are collapsed into equivalent vector-level operations (e.g., comb.replicate, comb.and, comb.xor, comb.or).

// Before (scalarized mux)
%sel_inv = comb.xor %sel, %true : i1
%and_a = comb.and %a, %sel : i1
%and_b = comb.and %b, %sel_inv : i1
%mux = comb.or %and_a, %and_b : i1
...
(repeated for each bit)

// After (vectorized mux)
%true = hw.constant true
%sel_vec = comb.replicate %sel : (i1) -> i4
%a_masked = comb.and %a, %sel_vec : i4
%sel_inv_vec = comb.xor %sel_vec, (comb.replicate %true) : i4
%b_masked = comb.and %b, %sel_inv_vec : i4
%mux = comb.or %a_masked, %b_masked : i4
hw.output %mux : i4

4. Partial Vectorization (Chunking):

  • Pattern: The pass identifies contiguous sub-ranges (chunks) that can be vectorized independently, even if the entire bus cannot be.

  • Transformation: The pass vectorizes the identifiable chunks (e.g., a linear chunk) and leaves the remaining scalar or structural logic as another chunk, then concatenates the chunks back together.

// Before (Mixed linear and structural patterns)
// out[3:1] = in[3:1] (linear)
// out[0]   = in[1] ^ in[0] (structural)
%in_3 = comb.extract %in from 3 : (i4) -> i1
%in_2 = comb.extract %in from 2 : (i4) -> i1
%in_1 = comb.extract %in from 1 : (i4) -> i1
// Logic for bit 0
%in_1_for_0 = comb.extract %in from 1 : (i4) -> i1
%in_0 = comb.extract %in from 0 : (i4) -> i1
%bit_0 = comb.xor %in_1_for_0, %in_0 : i1
// Final concatenation
%concat = comb.concat %in_3, %in_2, %in_1, %bit_0 : i1, i1, i1, i1
hw.output %concat : i4

// After (Partially vectorized)
// Chunk 1: [3:1] (vectorized)
%chunk_1 = comb.extract %in from 1 for 3 : (i4) -> i3
// Chunk 0: [0] (scalar logic)
%in_1 = comb.extract %in from 1 : (i4) -> i1
%in_0 = comb.extract %in from 0 : (i4) -> i1
%chunk_0 = comb.xor %in_1, %in_0 : i1
// Re-concat the vectorized chunks
%final = comb.concat %chunk_1, %chunk_0 : i3, i1
hw.output %final : i4

Patterns not transformed
The pass does not modify modules with cross-bit dependencies or non-linear control flows.
For example:

// cross-dependency example (should remain unchanged)
hw.module @cross_dependency(in %in : i2, out out : i2) {
  %0 = comb.extract %in from 0 : (i2) -> i1
  %1 = comb.extract %6 from 1 : (i2) -> i1
  %2 = comb.xor %0, %1 : i1
  %3 = comb.extract %in from 1 : (i2) -> i1
  %4 = comb.extract %6 from 0 : (i2) -> i1
  %5 = comb.xor %3, %4 : i1
  %6 = comb.concat %5, %2 : i1, i1
  hw.output %6 : i2
}

@mafeguimaraes mafeguimaraes force-pushed the feature/hw-vectorization-pass branch 2 times, most recently from acc6a29 to 59a27df Compare November 10, 2025 19:15
@mafeguimaraes mafeguimaraes force-pushed the feature/hw-vectorization-pass branch from 59a27df to 7dfbad0 Compare November 10, 2025 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant