-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Is your feature request related to a problem?
The current implementation doesn't properly handle:
- Non-Latin scripts (Arabic, Devanagari, CJK, etc.)
- Internationalized Domain Names (IDNs) like भारत.भारत
- Email Address Internationalization (EAI) like user@عربي.السعودية
- Unicode normalization (NFC/NFD/NFKC/NFKD)
- Bidirectional text (RTL/LTR mixing)
- Zero-width characters (ZWNJ/ZWJ) needed for Indic scripts
Describe the solution you'd like
Full Unicode and Universal Acceptance compliance:
# Should work correctly
guardrail.validate("प्रयोक्ता@भारत.भारत") # EAI email
guardrail.validate("https://भारत.भारत") # IDN domain
guardrail.validate("مرحبا שלום") # Mixed RTL scripts
guardrail.validate("हिन्दी") # ZWNJ in HindiTechnical Requirements:
- Unicode 15.0 support
- Grapheme cluster counting (not byte counting)
- Confusable character detection (homograph attacks)
- Zero-width character handling for Indic scripts
- IDN/EAI validation per RFC 5891/6531
- Proper normalization across all forms
Additional context
- Required for global internet standards compliance
- Critical for preventing Unicode-based security attacks (homographs)
- Necessary for Indian language support (22 official languages)
- Enables proper handling of emoji and complex scripts
References:
- Universal Acceptance Steering Group
- Unicode Security Considerations
- RFC 5891 (IDNA 2008)
- RFC 6531 (Internationalized Email)
Implementation Impact
This would make any-guardrail compliant with international standards and usable globally.
Metadata
Metadata
Assignees
Labels
No labels