Replace core parsing/resolution engine with Rubydex#447
Open
Replace core parsing/resolution engine with Rubydex#447
Conversation
Replace the entire AST walking, constant extraction, and resolution pipeline with Rubydex, a high-performance Ruby indexer written in Rust. The key architectural change is that Packwerk no longer parses files individually and walks AST nodes to find constant references. Instead, Rubydex indexes all workspace files in a single batch call, resolves all constants, and provides resolved references with direct links to their target declarations. Core changes: - RunContext: rewritten to create a Rubydex::Graph, index the full workspace for resolution, and iterate resolved constant references to detect cross-package violations - ParseRun: simplified to two phases (index_and_resolve + find_offenses) instead of per-file parallel processing - Association detection: rewritten using Prism native AST (no longer needs the parser gem translation layer), runs as a supplementary pass since Rubydex doesn't understand ActiveRecord associations - ERB support: simplified to extract_ruby_source which feeds into Rubydex's index_source API - ApplicationValidator: uses Rubydex::Graph instead of ConstantResolver Removed dependencies: constant_resolver, parallel, ast, parser (direct) Added dependency: rubydex Kept: prism (for association detection), better_html (for ERB) Deleted 16 source files, 11 test files, 3 RBI files (~3,400 lines removed). Net change: +403 / -3,391 lines.
ad87dca to
7fa6be2
Compare
Update tapioca require file to remove constant_resolver, parallel, spring, and minitest/autorun; add prism and rubydex requires. Regenerate all gem RBIs, cleaning up stale files from older gem versions. Add ostruct gem for Ruby 4.0 compatibility with yard/tapioca.
7fa6be2 to
46eea11
Compare
Contributor
|
Cool stuff, Ufuk. Would this mean packwerk doesn't depend on the interrogated codebase using zeitwerk anymore? |
Member
Author
That's correct. Rubydex can do proper Ruby constant resolution, so we don't need to use Zeitwerk heuristics to figure out what constant references resolve to based on their filename. At least, we shouldn't, and, if there are any problems with the resolution, then we should fix them in Rubydex. |
ERB files fed to graph.index_source need a file:// URI, not a bare path. Also add location_to_relative_path helper that catches NotFileUriError for any edge cases where a location doesn't have a file:// URI.
The post-graph work (especially the association detection pass that re-parses all files with Prism) is a significant portion of total runtime. On the Shopify monolith, the association pass takes ~39s single-threaded. Parallelize the Prism parsing phase of association detection using the parallel gem. The resolution and violation checking phases remain sequential since they use shared state (graph + package_set). The parallel flag flows from Configuration -> ParseRun -> RunContext as before.
Split collect_constant_reference_offenses into two phases: 1. Extract: iterate Rubydex's resolved references and pull all needed data into plain Ruby hashes grouped by source file. This must be sequential since it crosses the Rust FFI boundary. 2. Check: process each file's references for dependency violations in parallel using forked workers. Only plain Ruby objects (strings, integers, hashes) cross the fork boundary -- no Rust FFI objects. On the Shopify monolith, the post-graph reference iteration + violation checking was the biggest bottleneck at ~134s single-threaded. The extraction phase remains sequential but the violation checking across ~57k cross-package references is now parallelized.
- Update gemspec to require rubydex >= 0.1.0.beta12 - Point Gemfile to beta12 from RubyGems - Restore accidentally commented-out offense formatter calls in CheckCommand#run - Move parallel flag to RunContext constructor instead of passing through find_offenses kwargs (avoids Mocha/Sorbet interaction issues)
Shared namespaces like GraphApi and Checkouts are defined in dozens of packages across a monolith. Rubydex resolves a constant reference to a Declaration, but the first definition might be in a different package even though the namespace is also defined locally. Before: used declaration.definitions.first to get the target path, causing 219k false violations on Shopify Core. After: check ALL definitions of the target constant. If any definition lives in the same package as the source file, the reference is local and is skipped. Only references where no definition exists in the source package are reported as cross-package violations.
Iterating all definitions of a constant for every one of 9M references
was O(refs * defs_per_constant), which caused a 15+ minute hang on
the Shopify monolith where shared namespaces have hundreds of defs.
Pre-compute a hash of constant_name → {packages, target_path} by
iterating graph.declarations once upfront. The per-reference check
is now a single Set#include? call -- O(1).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
parsergem as a direct dependencyArchitecture change
Before: For each file → parse with Prism → walk AST nodes → extract constant references via
ConstNodeInspector→ resolve viaconstant_resolvergem → check package violationsAfter: Index all workspace files in one
Rubydex::Graphbatch call → resolve all constants → iterateResolvedConstantReferenceobjects with direct links to target declarations → check package violations. Association detection (has_many,belongs_to, etc.) runs as a supplementary Prism-based pass since Rubydex doesn't understand ActiveRecord semantics.Dependencies
constant_resolverrubydexparallelast(direct)parser(direct)Kept:
prism(for association detection),better_html(for ERB),activesupport,sorbet-runtime,zeitwerk,bundler.Deleted files (16 source + 11 test + 3 RBI)
The entire per-file parsing pipeline:
file_processor,node_processor,node_processor_factory,node_visitor,node_helpers,const_node_inspector,constant_name_inspector,constant_discovery,parsed_constant_definitions,reference_extractor,unresolved_reference,association_inspector,cache,parsers/ruby,parsers/factory,parsers/parser_interface.Key design decisions
index_and_resolveindexes ALL Ruby files in the workspace (not just the scoped check set) so Rubydex can resolve cross-package references. Only the scoped files are checked for violations.has_many :ordersas a method call, not a constant reference. Usesgraph.resolve_constantto resolve the inferred constant name.Parsers::Erbsimplified toextract_ruby_sourcewhich feeds intograph.index_source.parallelflag is accepted but ignored.Test results
All 136 tests pass (excluding
spring_command_testandautoload_testwhich have pre-existing Ruby 4.0 incompatibilities unrelated to this change).