Replace core parsing/resolution engine with Rubydex by paracycle · Pull Request #447 · Shopify/packwerk

paracycle · 2026-04-13T23:25:48Z

Summary

Replace the entire AST walking, constant extraction, and resolution pipeline with Rubydex, a high-performance Ruby indexer written in Rust
Rewrite association detection to use Prism's native AST directly, dropping the parser gem as a direct dependency
Net result: +403 / -3,391 lines across 47 files

Architecture change

Before: For each file → parse with Prism → walk AST nodes → extract constant references via ConstNodeInspector → resolve via constant_resolver gem → check package violations

After: Index all workspace files in one Rubydex::Graph batch call → resolve all constants → iterate ResolvedConstantReference objects with direct links to target declarations → check package violations. Association detection (has_many, belongs_to, etc.) runs as a supplementary Prism-based pass since Rubydex doesn't understand ActiveRecord semantics.

Dependencies

Removed	Added
`constant_resolver`	`rubydex`
`parallel`
`ast` (direct)
`parser` (direct)

Kept: prism (for association detection), better_html (for ERB), activesupport, sorbet-runtime, zeitwerk, bundler.

Deleted files (16 source + 11 test + 3 RBI)

The entire per-file parsing pipeline: file_processor, node_processor, node_processor_factory, node_visitor, node_helpers, const_node_inspector, constant_name_inspector, constant_discovery, parsed_constant_definitions, reference_extractor, unresolved_reference, association_inspector, cache, parsers/ruby, parsers/factory, parsers/parser_interface.

Key design decisions

Full workspace indexing: index_and_resolve indexes ALL Ruby files in the workspace (not just the scoped check set) so Rubydex can resolve cross-package references. Only the scoped files are checked for violations.
Association pass: Kept as a separate Prism-native AST walk since Rubydex treats has_many :orders as a method call, not a constant reference. Uses graph.resolve_constant to resolve the inferred constant name.
ERB support: Parsers::Erb simplified to extract_ruby_source which feeds into graph.index_source.
No more caching: Rubydex's Rust engine is fast enough that the MD5-based file cache is unnecessary.
No more parallelism gem: Rubydex handles parallelism internally in Rust. The parallel flag is accepted but ignored.

Test results

All 136 tests pass (excluding spring_command_test and autoload_test which have pre-existing Ruby 4.0 incompatibilities unrelated to this change).

Replace the entire AST walking, constant extraction, and resolution pipeline with Rubydex, a high-performance Ruby indexer written in Rust. The key architectural change is that Packwerk no longer parses files individually and walks AST nodes to find constant references. Instead, Rubydex indexes all workspace files in a single batch call, resolves all constants, and provides resolved references with direct links to their target declarations. Core changes: - RunContext: rewritten to create a Rubydex::Graph, index the full workspace for resolution, and iterate resolved constant references to detect cross-package violations - ParseRun: simplified to two phases (index_and_resolve + find_offenses) instead of per-file parallel processing - Association detection: rewritten using Prism native AST (no longer needs the parser gem translation layer), runs as a supplementary pass since Rubydex doesn't understand ActiveRecord associations - ERB support: simplified to extract_ruby_source which feeds into Rubydex's index_source API - ApplicationValidator: uses Rubydex::Graph instead of ConstantResolver Removed dependencies: constant_resolver, parallel, ast, parser (direct) Added dependency: rubydex Kept: prism (for association detection), better_html (for ERB) Deleted 16 source files, 11 test files, 3 RBI files (~3,400 lines removed). Net change: +403 / -3,391 lines.

Update tapioca require file to remove constant_resolver, parallel, spring, and minitest/autorun; add prism and rubydex requires. Regenerate all gem RBIs, cleaning up stale files from older gem versions. Add ostruct gem for Ruby 4.0 compatibility with yard/tapioca.

exterm · 2026-04-14T13:40:19Z

Cool stuff, Ufuk. Would this mean packwerk doesn't depend on the interrogated codebase using zeitwerk anymore?

paracycle · 2026-04-14T14:21:36Z

Cool stuff, Ufuk. Would this mean packwerk doesn't depend on the interrogated codebase using zeitwerk anymore?

That's correct. Rubydex can do proper Ruby constant resolution, so we don't need to use Zeitwerk heuristics to figure out what constant references resolve to based on their filename. At least, we shouldn't, and, if there are any problems with the resolution, then we should fix them in Rubydex.

ERB files fed to graph.index_source need a file:// URI, not a bare path. Also add location_to_relative_path helper that catches NotFileUriError for any edge cases where a location doesn't have a file:// URI.

The post-graph work (especially the association detection pass that re-parses all files with Prism) is a significant portion of total runtime. On the Shopify monolith, the association pass takes ~39s single-threaded. Parallelize the Prism parsing phase of association detection using the parallel gem. The resolution and violation checking phases remain sequential since they use shared state (graph + package_set). The parallel flag flows from Configuration -> ParseRun -> RunContext as before.

Split collect_constant_reference_offenses into two phases: 1. Extract: iterate Rubydex's resolved references and pull all needed data into plain Ruby hashes grouped by source file. This must be sequential since it crosses the Rust FFI boundary. 2. Check: process each file's references for dependency violations in parallel using forked workers. Only plain Ruby objects (strings, integers, hashes) cross the fork boundary -- no Rust FFI objects. On the Shopify monolith, the post-graph reference iteration + violation checking was the biggest bottleneck at ~134s single-threaded. The extraction phase remains sequential but the violation checking across ~57k cross-package references is now parallelized.

- Update gemspec to require rubydex >= 0.1.0.beta12 - Point Gemfile to beta12 from RubyGems - Restore accidentally commented-out offense formatter calls in CheckCommand#run - Move parallel flag to RunContext constructor instead of passing through find_offenses kwargs (avoids Mocha/Sorbet interaction issues)

Shared namespaces like GraphApi and Checkouts are defined in dozens of packages across a monolith. Rubydex resolves a constant reference to a Declaration, but the first definition might be in a different package even though the namespace is also defined locally. Before: used declaration.definitions.first to get the target path, causing 219k false violations on Shopify Core. After: check ALL definitions of the target constant. If any definition lives in the same package as the source file, the reference is local and is skipped. Only references where no definition exists in the source package are reported as cross-package violations.

Iterating all definitions of a constant for every one of 9M references was O(refs * defs_per_constant), which caused a 15+ minute hang on the Shopify monolith where shared namespaces have hundreds of defs. Pre-compute a hash of constant_name → {packages, target_path} by iterating graph.declarations once upfront. The per-reference check is now a single Set#include? call -- O(1).

paracycle requested a review from a team as a code owner April 13, 2026 23:25

paracycle force-pushed the ufuk/rubydex-rewrite branch 2 times, most recently from ad87dca to 7fa6be2 Compare April 13, 2026 23:57

paracycle added 3 commits April 14, 2026 03:00

Fix rubocop offenses

0648748

Remove 3.1 and add 3.4 to the test matrix

46eea11

paracycle force-pushed the ufuk/rubydex-rewrite branch from 7fa6be2 to 46eea11 Compare April 14, 2026 00:00

paracycle added 6 commits April 14, 2026 22:31

Fix NotFileUriError for ERB files indexed via index_source

f90735c

ERB files fed to graph.index_source need a file:// URI, not a bare path. Also add location_to_relative_path helper that catches NotFileUriError for any edge cases where a location doesn't have a file:// URI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace core parsing/resolution engine with Rubydex#447

Replace core parsing/resolution engine with Rubydex#447
paracycle wants to merge 10 commits intomainfrom
ufuk/rubydex-rewrite

paracycle commented Apr 13, 2026

Uh oh!

exterm commented Apr 14, 2026 •

edited

Loading

Uh oh!

paracycle commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paracycle commented Apr 13, 2026

Summary

Architecture change

Dependencies

Deleted files (16 source + 11 test + 3 RBI)

Key design decisions

Test results

Uh oh!

exterm commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paracycle commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

exterm commented Apr 14, 2026 •

edited

Loading