Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
220 changes: 186 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,22 @@ The implementation is similar to the concepts in [metalanguage](https://github.c

## Implementations

Meta-notation is available in multiple languages:
Meta-notation is available in multiple languages with identical behavior:

- **[JavaScript/TypeScript](./js)** - Full-featured implementation with PEG.js grammar
- **[Rust](./rust)** - High-performance implementation with serde support

Both implementations produce the same parsed object structure and pass the same test cases.

## Features

- **Universal Delimiter Parsing**: Parses `()`, `{}`, `[]`, `''`, `""`, `` ` ` ``
- **Language Agnostic**: Works with 25+ programming languages and all natural languages
- **Nested Structures**: Supports arbitrary nesting of delimiters
- **Round-trip Serialization**: Parse and serialize back to original text
- **Multiple Language Implementations**: JavaScript/TypeScript and Rust
- **Multiple Language Implementations**: JavaScript/TypeScript and Rust with identical output
- **Simple Grammar**: Clean, efficient parsing
- **Comprehensive Tests**: 81+ test cases for programming and natural languages
- **Comprehensive Tests**: 170+ test cases across both implementations covering programming and natural languages

## Installation

Expand Down Expand Up @@ -73,11 +75,16 @@ assert_eq!(serialized, code);

## API

### `parse(input: string): Sequence`
### `parse(input) -> Sequence`

Parses text into a sequence of blocks. Each block has a `type` and `content`.

Parses text into a sequence of blocks.
- **Bracket delimiters** (`paren`, `curly`, `square`): `content` is a nested array of blocks
- **Quote delimiters** (`singleQuote`, `doubleQuote`, `backtick`): `content` is a plain string (no nested parsing inside quotes)
- **Plain text** (`text`): `content` is a string

```typescript
// JavaScript/TypeScript
const result = parse('hello (world) {test}');
// Returns:
// [
Expand All @@ -88,11 +95,108 @@ const result = parse('hello (world) {test}');
// ]
```

### `serialize(sequence: Sequence): string`
```rust
// Rust
let result = parse("hello (world) {test}");
// Returns:
// [
// Block::Text("hello "),
// Block::Paren([Block::Text("world")]),
// Block::Text(" "),
// Block::Curly([Block::Text("test")])
// ]
```

#### Nested Structures

Bracket delimiters can be nested arbitrarily:

```typescript
const result = parse('{a [b (c) d] e}');
// Returns:
// [
// { type: 'curly', content: [
// { type: 'text', content: 'a ' },
// { type: 'square', content: [
// { type: 'text', content: 'b ' },
// { type: 'paren', content: [{ type: 'text', content: 'c' }] },
// { type: 'text', content: ' d' }
// ]},
// { type: 'text', content: ' e' }
// ]}
// ]
```

#### Quotes

Quotes capture their content as a plain string without further parsing:

```typescript
const result = parse('"hello {world}"');
// Returns:
// [
// { type: 'doubleQuote', content: 'hello {world}' }
// ]
// Note: {world} is NOT parsed as a curly block inside quotes
```

#### Real-World Examples

**JavaScript code:**
```typescript
const result = parse('const greet = (name) => { return `Hello, ${name}!`; };');
// Returns:
// [
// { type: 'text', content: 'const greet = ' },
// { type: 'paren', content: [{ type: 'text', content: 'name' }] },
// { type: 'text', content: ' => ' },
// { type: 'curly', content: [
// { type: 'text', content: ' return ' },
// { type: 'backtick', content: 'Hello, ${name}!' },
// { type: 'text', content: '; ' }
// ]},
// { type: 'text', content: ';' }
// ]
```

**JSON:**
```typescript
const result = parse('{"name": "John", "tags": ["dev", "admin"]}');
// Returns:
// [
// { type: 'curly', content: [
// { type: 'doubleQuote', content: 'name' },
// { type: 'text', content: ': ' },
// { type: 'doubleQuote', content: 'John' },
// { type: 'text', content: ', ' },
// { type: 'doubleQuote', content: 'tags' },
// { type: 'text', content: ': ' },
// { type: 'square', content: [
// { type: 'doubleQuote', content: 'dev' },
// { type: 'text', content: ', ' },
// { type: 'doubleQuote', content: 'admin' }
// ]}
// ]}
// ]
```

**Natural language:**
```typescript
const result = parse('She said, "Hello, world!" and smiled.');
// Returns:
// [
// { type: 'text', content: 'She said, ' },
// { type: 'doubleQuote', content: 'Hello, world!' },
// { type: 'text', content: ' and smiled.' }
// ]
```

### `serialize(sequence) -> string`

Converts a sequence of blocks back to text.

```typescript
// JavaScript/TypeScript
const blocks = [
{ type: 'text', content: 'hello ' },
{ type: 'paren', content: [{ type: 'text', content: 'world' }] }
Expand All @@ -101,50 +205,87 @@ const text = serialize(blocks);
// Returns: "hello (world)"
```

```rust
// Rust
let blocks = vec![
Block::Text("hello ".to_string()),
Block::Paren(vec![Block::Text("world".to_string())]),
];
let text = serialize(&blocks);
// Returns: "hello (world)"
```

## Types

### JavaScript/TypeScript

```typescript
type DelimiterType = 'paren' | 'curly' | 'square' | 'singleQuote' | 'doubleQuote' | 'backtick' | 'text';

interface Block {
type: DelimiterType;
content: Block[] | string;
content: Block[] | string; // Block[] for brackets, string for quotes and text
}

type Sequence = Block[];
```

### Rust

```rust
pub enum Block {
Paren(Vec<Block>), // () - content is nested blocks
Curly(Vec<Block>), // {} - content is nested blocks
Square(Vec<Block>), // [] - content is nested blocks
SingleQuote(String), // '' - content is a plain string
DoubleQuote(String), // "" - content is a plain string
Backtick(String), // `` - content is a plain string
Text(String), // plain text
}
```

The Rust `Block` enum uses serde's `#[serde(tag = "type", content = "content", rename_all = "camelCase")]` attribute, so it serializes to the same JSON structure as the JavaScript implementation:

```json
[
{ "type": "text", "content": "hello " },
{ "type": "paren", "content": [{ "type": "text", "content": "world" }] }
]
```

## Language Support

Meta-notation works seamlessly with both programming languages and natural languages.

### Programming Languages (Tested)

- **JavaScript/TypeScript** - Functions, arrow functions, template literals
- **Python** - Dictionaries, lists, function definitions
- **Go** - Functions, print statements
- **Rust** - Vectors, macros, format strings
- **C++** - Streams, functions, return statements
- **Java** - Classes, methods, arrays
- **C#** - LINQ, collections, generics
- **Ruby** - Methods, string interpolation
- **PHP** - Functions, arrays, associative arrays
- **Swift** - Functions, string interpolation
- **Kotlin** - Functions, lists
- **Scala** - Functions, type annotations
- **Perl** - Subroutines, arrays
- **Haskell** - Pure functions
- **Lisp/Scheme** - S-expressions
- **Clojure** - Vectors, strings
- **Lua** - Functions, string concatenation
- **Elixir** - Functions, string interpolation
- **R** - Functions, paste
- **MATLAB** - Functions
- **SQL** - SELECT statements with WHERE clauses
- **JSON** - Objects and arrays
- **YAML** - Arrays (with bracket syntax)
- **Bash/Shell** - Echo, variables, pipes
- **Markdown** - Code blocks with backticks
| Language | Delimiters Used | Example |
|----------|----------------|---------|
| JavaScript/TypeScript | `() {} \`\`` | `const greet = (name) => { return \`Hello\`; };` |
| Python | `() {} [] ""` | `def calc(x): return {"sum": x, "list": [x]}` |
| Go | `() {} ""` | `func main() { fmt.Println("Hello") }` |
| Rust | `() {} [] ""` | `fn main() { let x = vec![1]; println!("{}", x); }` |
| C++ | `() {} ""` | `int main() { std::cout << "Hello"; }` |
| Java | `() {} [] ""` | `class Main { void main(String[] args) {} }` |
| C# | `() {} ""` | `void Test() { Console.WriteLine("Done"); }` |
| Ruby | `() ""` | `def greet(name); puts "Hello"; end` |
| PHP | `() {} [] ""` | `function test($x) { return ["key" => "val"]; }` |
| Swift | `() {} ""` | `func greet(name: String) { return "Hello" }` |
| Kotlin | `() {} ""` | `fun main() { println("Hello") }` |
| Scala | `() {}` | `def add(x: Int, y: Int): Int = { x + y }` |
| Perl | `() {} ""` | `sub greet { print "Hello\n"; }` |
| Haskell | `""` | `main = putStrLn "Hello, World!"` |
| Lisp/Scheme | `()` | `(define (factorial n) (if (= n 0) 1 (* n 1)))` |
| Clojure | `() [] ""` | `(defn greet [name] (str "Hello"))` |
| Lua | `() ""` | `function greet(name) return "Hello" end` |
| Elixir | `() ""` | `def greet(name), do: "Hello"` |
| R | `() {} ""` | `greet <- function(name) { paste("Hello") }` |
| MATLAB | `()` | `function y = square(x); y = x .^ 2; end` |
| SQL | `""` | `SELECT name FROM users WHERE status = "active"` |
| JSON | `{} [] ""` | `{"name": "John", "tags": ["dev"]}` |
| YAML | `[] ""` | `dependencies: ["react", "typescript"]` |
| Bash/Shell | `""` | `echo "Hello, ${USER}!" \| grep "Hello"` |
| Markdown | `` \`\` `` | ``Here is code: `const x = 1;` in backticks.`` |

### Natural Languages (Tested)

Expand All @@ -156,13 +297,14 @@ Meta-notation parses natural language text including:
- **Academic writing** with nested structures
- **Legal text** with section references
- **Technical documentation** mixing code and prose
- **Multiple languages**: English, Spanish, French, German, Italian, Portuguese, and more
- **Mathematical expressions**: `f(x) = [a + b] * {c - d}`
- **Multiple languages**: English, Spanish, French, German, Italian, Portuguese, Russian, Japanese, Chinese, and more

Works with any language that uses these common delimiters for structure.

## Examples

See the [examples](./src/examples) directory for more detailed usage examples.
See the [examples](./js/src/examples) directory for more detailed usage examples.

## Building

Expand Down Expand Up @@ -190,13 +332,23 @@ cd js
npm test
```

81 test cases covering parser, serializer, programming languages, and natural languages.

### Rust

```bash
cd rust
cargo test
```

92 test cases covering the same scenarios plus dedicated parser and serializer unit tests.

Both implementations verify:
- Exact parsed object structure matches expected output
- Round-trip serialization preserves original text
- All delimiter types are correctly identified
- Nested structures are handled correctly

## Comparison with Links-Notation

| Feature | meta-notation | links-notation |
Expand Down
29 changes: 29 additions & 0 deletions experiments/verify_json_equivalence.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
// Experiment: Verify that Rust serde JSON output matches JS parsed object structure
// Run with: cd /tmp/gh-issue-solver-1774090289115 && cargo test --test verify_json_equivalence -- --nocapture

use meta_notation::{parse, Block};

fn main() {
let test_cases = vec![
"hello world",
"(hello)",
"{world}",
"[test]",
"'hello'",
"\"world\"",
"`code`",
"hello (world) {test}",
"(a (b) c)",
"{a [b (c) d] e}",
"(){}[]",
"\"hello {world}\"",
];

for input in test_cases {
let result = parse(input);
let json = serde_json::to_string_pretty(&result).unwrap();
println!("Input: {:?}", input);
println!("JSON: {}", json);
println!("---");
}
}
27 changes: 27 additions & 0 deletions experiments/verify_json_equivalence_test.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
use meta_notation::parse;

#[test]
fn verify_json_equivalence() {
let test_cases = vec![
"hello world",
"(hello)",
"{world}",
"[test]",
"'hello'",
"\"world\"",
"`code`",
"hello (world) {test}",
"(a (b) c)",
"{a [b (c) d] e}",
"(){}[]",
"\"hello {world}\"",
];

for input in test_cases {
let result = parse(input);
let json = serde_json::to_string_pretty(&result).unwrap();
println!("Input: {:?}", input);
println!("JSON: {}", json);
println!("---");
}
}
Loading
Loading