Redisign dialect check #256
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR eliminates indirection in the parser that previously relied on
Parser_state.Dialect_featureto remember dialect-specific tokens for deferred processing. Instead, the frontend AST is now enriched to carry all necessary information directly.Details
Previously, the parser used a mutable state module
Parser_state.Dialect_featureto track dialect-specific features during parsing:When encountering features like
STRAIGHT_JOIN,REPLACE INTO,ON DUPLICATE KEY, collations, unsigned types, etc., the parser would call functions like set_straight_join, set_collation, set_unsigned_types to store feature markers in a global refThese markers were later processed separately from the AST
This created an indirect flow where AST construction and dialect feature collection were decoupled, making the code harder to reason about and maintain.
Some type information is only relevant for dialect checking and should not propagate into the unification/resolution phase (
res_expr) or code generation stage.For example,
UInt32needs to be tracked for dialect validation, but at runtime it's always represented as int64 in OCaml. We don't want this distinction to leak into type inference or generated code.To address this, we introduce Source_type.t:
The naming Source_type (open to discussion) emphasizes that these types exist only at the "source" level — in the frontend AST — and are converted to regular Type.t before entering unification. This keeps dialect-specific details contained in the parsing/validation layer without polluting the core type system.