Skip to content

[Feature] Add Unicode 15.0 and Universal Acceptance (UA) Compliance #107

@anivar

Description

@anivar

Is your feature request related to a problem?

The current implementation doesn't properly handle:

  • Non-Latin scripts (Arabic, Devanagari, CJK, etc.)
  • Internationalized Domain Names (IDNs) like भारत.भारत
  • Email Address Internationalization (EAI) like user@عربي.السعودية
  • Unicode normalization (NFC/NFD/NFKC/NFKD)
  • Bidirectional text (RTL/LTR mixing)
  • Zero-width characters (ZWNJ/ZWJ) needed for Indic scripts

Describe the solution you'd like

Full Unicode and Universal Acceptance compliance:

# Should work correctly
guardrail.validate("प्रयोक्ता@भारत.भारत")  # EAI email
guardrail.validate("https://भारत.भारत")    # IDN domain
guardrail.validate("مرحبا שלום")             # Mixed RTL scripts
guardrail.validate("हिन्‌दी")                # ZWNJ in Hindi

Technical Requirements:

  • Unicode 15.0 support
  • Grapheme cluster counting (not byte counting)
  • Confusable character detection (homograph attacks)
  • Zero-width character handling for Indic scripts
  • IDN/EAI validation per RFC 5891/6531
  • Proper normalization across all forms

Additional context

  • Required for global internet standards compliance
  • Critical for preventing Unicode-based security attacks (homographs)
  • Necessary for Indian language support (22 official languages)
  • Enables proper handling of emoji and complex scripts

References:

Implementation Impact

This would make any-guardrail compliant with international standards and usable globally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions