Skip to content

Conversation

@maddeye
Copy link

@maddeye maddeye commented Dec 11, 2025

A single-header lexer that builds on top of nob.h for tokenizing source code.
Requires C99 or later.

Features:

  • Identifiers (including UTF-8/Unicode)
  • Integers: decimal, hex (0x), binary (0b), octal (0o)
  • Floats: decimal (3.14, 1e10) and C99 hex floats (0x1.Fp+10)
  • String and character literals with escape sequences
  • Line (//) and block (/* */) comments
  • Location tracking (file:line:column)
  • Optional prefix stripping (NOB_LEXER_STRIP_PREFIX)
  • Optional comment skipping (NOB_LEXER_SKIP_COMMENTS)
    Includes test suite in tests/lexer.c

Comparison with stb_c_lexer.h

Aspect nob_lexer.h stb_c_lexer.h
Dependencies Requires nob.h Standalone
Memory Zero-copy (tokens are views into source) Requires separate string storage buffer
Location tracking Built-in file:line:column on every token Separate inefficient function call
Binary literals Yes (0b1010) No
C99 hex floats Yes (0x1.Fp+10) Optional
Unicode identifiers Yes (UTF-8) Yes (>= 128 bytes)
Multi-char operators No (++, --, ==, etc.) Yes (C-complete)
Configuration Simple defines 20+ compile-time Y/N flags
API style Modern (dynamic arrays, string views) Classic C (manual buffer management)
Comment handling Tokens returned (optionally skip) Always discarded
Suffixes (uLL, etc.) No Optional

With this you no longer have to complain about stb_c_lexer. This implementation should be more than sufficient for your purposes and follows the simplicity of nob.

Hope you find this helpful 😄.

Disclaimer

I used llm for a code review and small formatting changes. Also it helped for the HexFloat implementation. The rest of the code is written by myself!

@marc-dantas
Copy link

stb_c_lexer replacement lol. Genius

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants