Skip to content

Replace core parsing/resolution engine with Rubydex#447

Open
paracycle wants to merge 10 commits intomainfrom
ufuk/rubydex-rewrite
Open

Replace core parsing/resolution engine with Rubydex#447
paracycle wants to merge 10 commits intomainfrom
ufuk/rubydex-rewrite

Conversation

@paracycle
Copy link
Copy Markdown
Member

Summary

  • Replace the entire AST walking, constant extraction, and resolution pipeline with Rubydex, a high-performance Ruby indexer written in Rust
  • Rewrite association detection to use Prism's native AST directly, dropping the parser gem as a direct dependency
  • Net result: +403 / -3,391 lines across 47 files

Architecture change

Before: For each file → parse with Prism → walk AST nodes → extract constant references via ConstNodeInspector → resolve via constant_resolver gem → check package violations

After: Index all workspace files in one Rubydex::Graph batch call → resolve all constants → iterate ResolvedConstantReference objects with direct links to target declarations → check package violations. Association detection (has_many, belongs_to, etc.) runs as a supplementary Prism-based pass since Rubydex doesn't understand ActiveRecord semantics.

Dependencies

Removed Added
constant_resolver rubydex
parallel
ast (direct)
parser (direct)

Kept: prism (for association detection), better_html (for ERB), activesupport, sorbet-runtime, zeitwerk, bundler.

Deleted files (16 source + 11 test + 3 RBI)

The entire per-file parsing pipeline: file_processor, node_processor, node_processor_factory, node_visitor, node_helpers, const_node_inspector, constant_name_inspector, constant_discovery, parsed_constant_definitions, reference_extractor, unresolved_reference, association_inspector, cache, parsers/ruby, parsers/factory, parsers/parser_interface.

Key design decisions

  1. Full workspace indexing: index_and_resolve indexes ALL Ruby files in the workspace (not just the scoped check set) so Rubydex can resolve cross-package references. Only the scoped files are checked for violations.
  2. Association pass: Kept as a separate Prism-native AST walk since Rubydex treats has_many :orders as a method call, not a constant reference. Uses graph.resolve_constant to resolve the inferred constant name.
  3. ERB support: Parsers::Erb simplified to extract_ruby_source which feeds into graph.index_source.
  4. No more caching: Rubydex's Rust engine is fast enough that the MD5-based file cache is unnecessary.
  5. No more parallelism gem: Rubydex handles parallelism internally in Rust. The parallel flag is accepted but ignored.

Test results

All 136 tests pass (excluding spring_command_test and autoload_test which have pre-existing Ruby 4.0 incompatibilities unrelated to this change).

@paracycle paracycle requested a review from a team as a code owner April 13, 2026 23:25
Replace the entire AST walking, constant extraction, and resolution
pipeline with Rubydex, a high-performance Ruby indexer written in Rust.

The key architectural change is that Packwerk no longer parses files
individually and walks AST nodes to find constant references. Instead,
Rubydex indexes all workspace files in a single batch call, resolves
all constants, and provides resolved references with direct links to
their target declarations.

Core changes:
- RunContext: rewritten to create a Rubydex::Graph, index the full
  workspace for resolution, and iterate resolved constant references
  to detect cross-package violations
- ParseRun: simplified to two phases (index_and_resolve + find_offenses)
  instead of per-file parallel processing
- Association detection: rewritten using Prism native AST (no longer
  needs the parser gem translation layer), runs as a supplementary
  pass since Rubydex doesn't understand ActiveRecord associations
- ERB support: simplified to extract_ruby_source which feeds into
  Rubydex's index_source API
- ApplicationValidator: uses Rubydex::Graph instead of ConstantResolver

Removed dependencies: constant_resolver, parallel, ast, parser (direct)
Added dependency: rubydex
Kept: prism (for association detection), better_html (for ERB)

Deleted 16 source files, 11 test files, 3 RBI files (~3,400 lines removed).
Net change: +403 / -3,391 lines.
@paracycle paracycle force-pushed the ufuk/rubydex-rewrite branch 2 times, most recently from ad87dca to 7fa6be2 Compare April 13, 2026 23:57
Update tapioca require file to remove constant_resolver, parallel,
spring, and minitest/autorun; add prism and rubydex requires.
Regenerate all gem RBIs, cleaning up stale files from older gem versions.
Add ostruct gem for Ruby 4.0 compatibility with yard/tapioca.
@paracycle paracycle force-pushed the ufuk/rubydex-rewrite branch from 7fa6be2 to 46eea11 Compare April 14, 2026 00:00
@exterm
Copy link
Copy Markdown
Contributor

exterm commented Apr 14, 2026

Cool stuff, Ufuk. Would this mean packwerk doesn't depend on the interrogated codebase using zeitwerk anymore?

@paracycle
Copy link
Copy Markdown
Member Author

Cool stuff, Ufuk. Would this mean packwerk doesn't depend on the interrogated codebase using zeitwerk anymore?

That's correct. Rubydex can do proper Ruby constant resolution, so we don't need to use Zeitwerk heuristics to figure out what constant references resolve to based on their filename. At least, we shouldn't, and, if there are any problems with the resolution, then we should fix them in Rubydex.

ERB files fed to graph.index_source need a file:// URI, not a bare
path. Also add location_to_relative_path helper that catches
NotFileUriError for any edge cases where a location doesn't have a
file:// URI.
The post-graph work (especially the association detection pass that
re-parses all files with Prism) is a significant portion of total
runtime. On the Shopify monolith, the association pass takes ~39s
single-threaded.

Parallelize the Prism parsing phase of association detection using
the parallel gem. The resolution and violation checking phases remain
sequential since they use shared state (graph + package_set).

The parallel flag flows from Configuration -> ParseRun -> RunContext
as before.
Split collect_constant_reference_offenses into two phases:

1. Extract: iterate Rubydex's resolved references and pull all needed
   data into plain Ruby hashes grouped by source file. This must be
   sequential since it crosses the Rust FFI boundary.

2. Check: process each file's references for dependency violations in
   parallel using forked workers. Only plain Ruby objects (strings,
   integers, hashes) cross the fork boundary -- no Rust FFI objects.

On the Shopify monolith, the post-graph reference iteration + violation
checking was the biggest bottleneck at ~134s single-threaded. The
extraction phase remains sequential but the violation checking across
~57k cross-package references is now parallelized.
- Update gemspec to require rubydex >= 0.1.0.beta12
- Point Gemfile to beta12 from RubyGems
- Restore accidentally commented-out offense formatter calls in
  CheckCommand#run
- Move parallel flag to RunContext constructor instead of passing
  through find_offenses kwargs (avoids Mocha/Sorbet interaction issues)
Shared namespaces like GraphApi and Checkouts are defined in dozens of
packages across a monolith. Rubydex resolves a constant reference to
a Declaration, but the first definition might be in a different package
even though the namespace is also defined locally.

Before: used declaration.definitions.first to get the target path,
causing 219k false violations on Shopify Core.

After: check ALL definitions of the target constant. If any definition
lives in the same package as the source file, the reference is local
and is skipped. Only references where no definition exists in the
source package are reported as cross-package violations.
Iterating all definitions of a constant for every one of 9M references
was O(refs * defs_per_constant), which caused a 15+ minute hang on
the Shopify monolith where shared namespaces have hundreds of defs.

Pre-compute a hash of constant_name → {packages, target_path} by
iterating graph.declarations once upfront. The per-reference check
is now a single Set#include? call -- O(1).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants