DNS Label Normalizer

Normalize a domain label, detect scripts and homoglyph risks.

About This Tool

The DNS Label Normalizer analyzes domain name labels for security risks, Unicode handling, and visual similarity attacks. This tool normalizes domain labels to lowercase, converts internationalized domain names (IDN) to Punycode representation, detects Unicode scripts used (Latin, Cyrillic, Chinese, etc.), and identifies homoglyph risks—visually similar characters from different scripts that can be used in phishing and typosquatting attacks. Whether you're registering domains, investigating suspicious URLs, or implementing domain security policies, this analyzer helps detect potential spoofing attempts like using Cyrillic 'а' (Unicode 0430) instead of Latin 'a' (Unicode 0061).

How to Use

  1. Enter a domain label in the input field (e.g., "ExAmPle", "münchen", "раypal")
  2. Labels can be in any case (uppercase, lowercase, mixed)
  3. Supports ASCII labels and internationalized (Unicode) labels
  4. Can input Punycode labels (xn--...) for reverse analysis
  5. Click "Analyze" to process the label
  6. View normalized lowercase form of the label
  7. Check detected Unicode scripts (Latin, Cyrillic, Greek, Arabic, Chinese, etc.)
  8. Review homoglyph risk indicator (warns if visually similar characters detected)
  9. See IDN status and any security warnings
  10. Use warnings to identify potential phishing or typosquatting domains

Features

  • DNS label normalization (case conversion to lowercase)
  • Punycode encoding/decoding for IDN labels
  • Unicode script detection (Latin, Cyrillic, Greek, Chinese, Arabic, etc.)
  • Homoglyph risk analysis (visually similar character detection)
  • Mixed script warning (potential spoofing indicator)
  • IDN (Internationalized Domain Name) identification
  • Security warnings for suspicious patterns
  • Support for ASCII and Unicode labels
  • Bidirectional Punycode conversion
  • Clear visual indicators for risks

Common Use Cases

  • Detecting phishing domains using homoglyphs (e.g., "pаypal.com" with Cyrillic а)
  • Validating domain registrations for security risks
  • Investigating suspicious URLs in phishing reports
  • Implementing domain security policies and filters
  • Brand protection and typosquatting detection
  • Educational demonstrations of IDN homograph attacks
  • Analyzing internationalized domain names before registration
  • Security research on visual similarity attacks
  • Browser extension development for phishing detection
  • DNS security policy enforcement

Technical Details

DNS labels must be normalized and encoded properly to prevent security issues. Unicode introduces complexity with visually similar characters from different scripts.

DNS Label Normalization:

  • Case folding: DNS labels are case-insensitive; Example.COM = example.com
  • Lowercase conversion: Standard practice to store labels in lowercase
  • Length limits: 1-63 characters per label (RFC 1035)
  • Character restrictions: Letters, digits, hyphens (a-z, 0-9, -) for ASCII labels

Internationalized Domain Names (IDN):

  • RFC 3490 (IDNA2003): Original IDN specification
  • RFC 5891 (IDNA2008): Updated IDN specification
  • Punycode encoding: ASCII-compatible encoding of Unicode labels (RFC 3492)
  • xn-- prefix: Identifies Punycode-encoded labels
  • Example: "münchen" to "xn--mnchen-3ya" (ü encoded as -3ya)

Punycode Encoding Process:

  1. Extract ASCII characters from label (if any)
  2. Encode non-ASCII characters using variable-length encoding
  3. Add "xn--" prefix
  4. Example: "日本" (Japan) to "xn--wgv71a"

Unicode Script Detection:

  • Script property: Every Unicode character belongs to a script
  • Common scripts:
    • Latin: a-z, A-Z (Unicode 0041-007A)
    • Cyrillic: а-я (Unicode 0400-04FF) - note: looks like Latin but different
    • Greek: α-ω (Unicode 0370-03FF)
    • Chinese: 中文 (Unicode 4E00-9FFF CJK Unified Ideographs)
    • Arabic: ا-ي (Unicode 0600-06FF)
  • Mixed scripts: Combining characters from multiple scripts (security risk)

Homoglyph Attacks (IDN Homograph Attacks):

Homoglyphs are characters from different scripts that look identical or very similar but have different Unicode code points.

  • Classic example:
    • Latin 'a' (Unicode 0061) vs. Cyrillic 'а' (Unicode 0430) - visually identical
    • "paypal.com" (Latin) vs. "pаypal.com" (Cyrillic а) - looks identical in browsers
  • Common homoglyph pairs:
    • Latin 'o' (Unicode 006F) vs Cyrillic 'о' (Unicode 043E)
    • Latin 'e' (Unicode 0065) vs Cyrillic 'е' (Unicode 0435)
    • Latin 'p' (Unicode 0070) vs Cyrillic 'р' (Unicode 0440)
    • Latin 'c' (Unicode 0063) vs Cyrillic 'с' (Unicode 0441)
    • Latin 'x' (Unicode 0078) vs Cyrillic 'х' (Unicode 0445)
  • Attack scenario:
    1. Attacker registers "аpple.com" using Cyrillic 'а'
    2. Victim sees "аpple.com" and thinks it's "apple.com"
    3. Victim enters credentials on phishing site
    4. Actual domain is "xn--pple-43d.com" (Punycode)

Mixed Script Detection:

  • Combining characters from multiple scripts in one label is suspicious
  • Example: "microsоft.com" (Latin + Cyrillic 'о')
  • Most legitimate domains use single script
  • Browsers display warnings for mixed-script IDN domains

Browser Protections:

  • Chrome/Firefox: Display Punycode (xn--...) instead of Unicode for suspicious domains
  • Mixed script blocking: Show Punycode if multiple scripts mixed
  • Top-level domain (TLD) restrictions: Some TLDs only allow specific scripts
  • Confusability checks: Registries may block homoglyph domains

Example Homoglyph Attack:

  • Target: paypal.com (all Latin)
  • Attack domain: pаypal.com (Cyrillic 'а' at position 2)
  • Punycode: xn--pypal-4ve.com
  • Visual appearance: Identical in many fonts
  • Detection: This tool identifies Cyrillic script, warns of homoglyph risk

Real-World Attack (2017):

Security researcher demonstrated homograph attack by registering "xn--80ak6aa92e.com" which displayed as "apple.com" (using Cyrillic characters) in browsers. Attack was used to show vulnerability in IDN handling.

Normalization Forms (Unicode):

  • NFC (Canonical Composition): Preferred form for IDN
  • NFD (Canonical Decomposition): Decomposes accented characters
  • Example: "é" can be represented as:
    • NFC: Unicode 00E9 (single character)
    • NFD: Unicode 0065 + 0301 (e + combining acute accent)
  • IDNA requires NFC normalization before Punycode encoding

Security Best Practices:

  • Registry policies: Implement confusability checks before registration
  • Browser warnings: Display Punycode for suspicious mixed-script domains
  • User education: Train users to check address bar for Punycode (xn--...)
  • Certificate validation: Check certificate CN/SAN against expected domain
  • Brand protection: Proactively register homoglyph variants of your domain

TLD-Specific Restrictions:

  • .com/.net/.org: Allow most scripts but monitor for abuse
  • .de (Germany): Restricts to Latin + umlauts (ä, ö, ü, ß)
  • .jp (Japan): Allows Japanese scripts (Hiragana, Katakana, Kanji)
  • .ru (Russia): Primarily Cyrillic script
  • Many TLDs use script-based restrictions to prevent homograph attacks

Detection Strategies:

  • Script analysis: Detect mixed scripts (Latin + Cyrillic)
  • Confusability checking: Compare visual similarity to known brands
  • Punycode inspection: Decode and analyze Unicode characters
  • Allowlists: Permit only expected domains in enterprise environments
  • Reputation systems: Flag newly registered homoglyph domains

Common Typosquatting Patterns:

  • Homoglyphs: Visual substitution (Cyrillic 'а' for Latin 'a')
  • Typos: Keyboard adjacency (gogle.com instead of google.com)
  • Bit-flipping: Single bit change in ASCII (google.com to gnoogle.com)
  • Hyphenation: Adding/removing hyphens (pay-pal.com)
  • TLD swapping: Different TLD (.co instead of .com)

Tool Output Interpretation:

  • Normalized: Lowercase version of the label
  • Scripts: Unicode scripts detected in label
  • Homoglyphs: Possible: Contains characters with visual similarity risks
  • IDN: Yes: Label contains non-ASCII Unicode characters
  • Warnings: Security issues detected (mixed scripts, confusable characters)

When to Be Suspicious:

  • Mixed scripts in well-known brand names
  • Punycode (xn--...) in unexpected contexts
  • Domains that look identical to known brands but decode differently
  • Newly registered domains with homoglyphs of popular sites

Legitimate IDN Use Cases:

  • Local language domains: мос.ru (Moscow), 中国.cn (China)
  • Internationalized brand names: münchen.de (Munich)
  • Local businesses serving non-English audiences
  • Government sites in local languages