Skip to content

feat: AST-based parsing improvements and examples-update command#38

Merged
maxim-uvarov merged 36 commits intomainfrom
ast
Jan 10, 2026
Merged

feat: AST-based parsing improvements and examples-update command#38
maxim-uvarov merged 36 commits intomainfrom
ast

Conversation

@maxim-uvarov
Copy link
Member

Summary

  • Add ast-complete command: Fills gaps in ast --flatten output with synthetic tokens (semicolons, whitespace, assignments, etc.) for complete byte coverage
  • Add split-statements command: Splits source code into individual statements using AST analysis, correctly handling nested blocks
  • Add examples-update command: Executes @example blocks and updates their --result values (similar to embeds-update but for examples)
  • Refactor list-module-commands: Uses new AST infrastructure for more accurate scope detection and attribute parsing

Changes

New commands

  • ast-complete - Complete AST output by filling gaps with synthetic tokens
  • split-statements - Split source into statements using AST analysis
  • find-examples - Find @example blocks with their code and result sections
  • execute-example - Execute example code and return result as nuon
  • examples-update - Update @example result values by executing them

Improvements

  • list-module-commands now uses ast-complete and split-statements for better accuracy
  • Added descriptions to @example attributes throughout codebase
  • Fixed @example result formats to use proper nuon syntax

Documentation

  • Added AST behavior test cases in tests/ast-cases/ documenting ast --flatten and ast --json behavior
  • Updated CLAUDE.md with project conventions
  • Added development notes in todo/

Tests

  • Unit tests for find-examples, execute-example, examples-update
  • Updated integration test fixtures

Test plan

  • nu toolkit.nu test passes
  • Manual verification of examples-update on real files
  • Review AST edge case documentation

🤖 Generated with Claude Code

claude and others added 30 commits January 2, 2026 16:55
Fill in example descriptions for dependencies, filter-commands-with-no-tests,
and set-x commands. These descriptions are used by nutest's generate-example-tests
to create documented test files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a command that executes @example blocks and updates their --result
values with actual execution output. Similar to embeds-update but for
@example attributes.

- Parses @example blocks with single-line results
- Executes code and updates results in nuon format
- Skips multiline results (starting with single quote)
- Updated existing example results to canonical nuon format

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents that `ast --flatten` omits:
- Statement-ending semicolons
- Variable assignment operators (=)

Uses dotnu embed format for captured outputs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents shape_block vs shape_closure distinction:
- shape_closure: def bodies, standalone closures
- shape_block: if/else, @example args

Also documents:
- Whitespace in brace tokens
- @example produces shape_garbage
- @ prefix not included in token

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents @example, @test, @deprecated attribute parsing:
- @ prefix not included in token content
- Detection via byte check at (span.start - 1)
- @test → shape_garbage, @example → shape_internalcall
- @ inside strings not tokenized separately
- Comments produce empty AST

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents def/export def tokenization:
- "export def" is a single token (not two)
- Command name is shape_string (quotes preserved)
- Signature is single shape_signature token
- Flags (--env, --wrapped) appear as shape_flag

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds `ast-complete` command that fills gaps in `ast --flatten` output
with synthetic tokens, providing complete byte coverage.

Synthetic shapes added:
- shape_semicolon: statement-ending `;`
- shape_assignment: variable assignment `=`
- shape_whitespace: spaces between tokens
- shape_pipe: pipe operator `|`
- shape_comma: comma separator `,`
- shape_gap: unclassified content (like `@` prefix)

This enables reliable span-based text replacement without string matching.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- All commands in commands.nu are exported by default
- mod.nu controls the public API via selective re-exports
- Internal commands are accessible but not in public API

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace regex-based parsing with AST-based approach for more reliable
@example detection:

- Use `ast --flatten` to tokenize source and get byte positions
- Detect @example by checking byte at (start-1) is "@"
- Extract code from shape_block token boundaries
- Handle --result flag detection via shape_flag tokens

This fixes:
- False positives from @example inside strings
- Potential crash from `| last` on empty input
- Fragile line-based parsing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix duplicate result bug:
- Use full original text for matching instead of just result line
- This ensures unique matches even when multiple examples have same result

Improve error handling:
- Use `do -i` with `complete` to capture subprocess errors properly
- Skip failed examples instead of corrupting file with error messages
- Print warning to stderr with code and error details

Module name stripping reviewed and verified working correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add 13 new tests covering:

find-examples (7 tests):
- Basic @example detection
- Multiple @examples in file
- @example inside string (ignored)
- @example without --result (skipped)
- Empty input handling
- Malformed @example handling
- Multiline code extraction

execute-example (3 tests):
- Simple expression execution
- Error handling (returns error record)
- Multiline result handling

examples-update (3 tests):
- Updates result values correctly
- Handles multiple examples
- Preserves file when no examples

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add three new AST behavior documentation files:

string-literals.nu:
- Single/double quoted strings (shape_string)
- Interpolated strings (shape_string_interpolation with nested tokens)
- Raw strings (shape_raw_string)
- Backtick strings (shape_external)
- Multiline strings, empty strings

operators.nu:
- Arithmetic operators (+, -, *, /, **)
- Comparison operators (==, !=, <, >)
- Logical operators (and, or, not)
- Range operators (.., ..<)
- Pipeline operator (shape_pipe)

variables.nu:
- Variable declaration (let/mut with shape_vardecl)
- Variable references (shape_variable vs shape_garbage)
- Environment variables ($env.X split into shape_variable + shape_string)
- Special variables ($in, $nu)
- Type annotations, variable shadowing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mark all four examples-update improvement tasks as completed:
- 001: AST-based find-examples (commit 6808e50)
- 002: Fix reliability bugs (commit faac906)
- 003: Add unit tests (commit 2662ff5)
- 004: Add AST test cases (commit 7f4491b)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
These internal commands need to be exported per project convention
(all commands in commands.nu are exported for testing).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use sentinel tokens [{end: 0}] and [{start: len}] to handle
  leading, inter-token, and trailing gaps in a single pass
- Remove redundant dead code in classify-gap (unreachable branch)
- Reduce ast-complete from ~60 to ~25 lines
- Reduce classify-gap from ~25 to ~10 lines

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Emphasize using `nu toolkit.nu test` (not separate commands)
- Add `--update` flag documentation
- Remove outdated test count

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reduce code from 21 to 14 lines by inlining variables and using
idiomatic where/each pattern instead of each/if/compact.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ection

Replace manual byte-checking for @ prefix with ast-complete which exposes
@ as shape_gap tokens. This simplifies @example detection logic:
- Check for shape_gap ending with "@" followed by "example" token
- Handle gaps that include preceding newlines (e.g., "\n\n@")

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New command that splits source code into individual statements using
AST analysis. Uses ast-complete to identify statement boundaries
(semicolons and newlines at top level). Correctly handles nested blocks -
newlines inside blocks don't create new statements.

Returns table with statement text and byte positions for precise extraction.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…pe detection

Replace line-based def detection with split-statements which provides:
- Accurate statement boundaries via AST analysis
- Proper scope ranges (start, end) for each def
- Better handling of multi-line def signatures

Also fix split-statements to:
- Handle self-contained blocks like {} with no net depth change
- Recognize shape_gap starting with newline as statement boundary
  (comments are bundled into gaps by AST)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…mands

Replace manual byte-checking for @ prefix with ast-complete pattern matching.
Now uses the same approach as find-examples: detect shape_gap ending with @
followed by attribute token.

Also removes unused code_bytes variable since all AST operations now use
ast-complete.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the ast-complete and split-statements work:
- Problem: ast --flatten omits semicolons, pipes, @, whitespace
- Solution: ast-complete fills gaps with synthetic tokens
- Built split-statements on top for statement boundary detection
- Refactored find-examples and list-module-commands to use these

Future work outlined:
- Document ast --json output with test cases
- General-purpose ast --json parser
- Pipeline analysis tool
- History command parser for nushell-history-based-completions
Adds a new section outlining the first step for future work: creating
test cases to document ast --json behavior before building parsers on
top of it. This follows the same literate programming approach used
for ast --flatten documentation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test case files documenting Nushell's `ast --json` behavior using
literate programming annotations. Covers basic output structure, command
calls with arguments/flags, blocks/closures/control flow, and span mapping.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…bstring)

Add detailed documentation comparing two approaches for extracting source
text from AST spans, recommending `bytes at` for its semantic match with
AST's exclusive end convention.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
claude and others added 6 commits January 5, 2026 21:46
Updated todo with findings showing ast --json is ideal for parsing history commands (vs ast --flatten). Added comparison table of features, example outputs for common patterns (flags, parameters, positional args), and mapping to database schema for history-based-completions project.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace hardcoded '/tmp/' paths with $nu.temp-path for Windows compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sort actual and expected values before comparison to handle
platform differences in glob ordering.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add sort-by to dependencies integration tests to ensure deterministic
output across platforms (macOS vs Windows glob ordering).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add CRLF to LF conversion when reading files on Windows to ensure
byte positions from AST parsing are correct.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@maxim-uvarov maxim-uvarov merged commit d244a97 into main Jan 10, 2026
2 checks passed
@maxim-uvarov maxim-uvarov deleted the ast branch January 10, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants