Punycode Converter & Homograph Inspector
Decode xn-- domains, convert Unicode hostnames safely, and inspect suspicious domains for mixed scripts, confusable characters, and homograph phishing risk — locally in your browser.
Analysis is local. No DNS lookup and no network request is made to the inspected hostname.
Paste a URL or hostname on the left, or load a sample above, to convert between Unicode and Punycode, inspect scripts, and flag confusable or lookalike characters.
Conversion and homograph analysis run entirely in your browser. No hostnames are sent to a server, and no DNS lookup or network request to the inspected host is performed in v1. This tool surfaces forensic evidence — it does not prove malicious intent.
Conversion and homograph analysis run entirely in your browser. No hostnames, URLs, or log entries are uploaded to a server. The tool does not perform DNS lookups or contact the inspected host in v1, which makes it safe for triaging suspicious domains without touching them. Findings are forensic signals, not verdicts on malicious intent — confirm attribution through a trusted channel.
Overview
Punycode exists because DNS was defined before Unicode. The DNS wire protocol allows only a narrow ASCII character set, so internationalised domain names (IDNs) are transported as ASCII-compatible encodings beginning with the prefix xn--. Your browser then decodes the label back into Unicode when it is safe to display. Punycode is a neutral transport mechanism; it is neither a phishing signal nor a safety guarantee on its own.
The same mechanism that enables legitimate IDNs (for example münchen.de transported as xn--mnchen-3ya.de) also enables homograph attacks, where characters from a non-Latin script are chosen to visually imitate a well-known ASCII brand. Reducing this to "xn-- is suspicious" or "all Unicode is unsafe" gets you false positives on legitimate sites and false negatives on ASCII-only tricks like paypa1.com.
This tool combines bi-directional IDN conversion with multi-signal homograph analysis: per-label script detection, mixed-script detection, a curated confusable-glyph map, a confusable-skeleton check for whole-script spoofs, and ASCII-only lookalike patterns such as digit-letter swaps and character-pair tricks. The result is a tiered risk view intended to help phishing triage, domain review, and legitimate IDN work in a single forensic assistant.
Use cases
When to use it
- Phishing triage on a suspicious URLextract the hostname from a link in an email or support ticket and inspect it for mixed-script, whole-script, or ASCII lookalike homograph risk without clicking it.
- Decoding xn-- domains from logspaste raw DNS, proxy, or SIEM hostnames that appear in their Punycode form and read them back in their original Unicode spelling.
- Legitimate IDN development workconvert a brand's IDN to its Punycode transport form for DNS, mail, and TLS certificate configuration, and confirm labels encode correctly per RFC 3492.
- Bulk review of suspicious domain listspaste dozens or hundreds of hostnames from a log export, header dump, or ticket to get a risk-ranked table with CSV export.
- Security awareness training and documentationuse the per-character confusable table with code points and scripts to illustrate real homograph techniques clearly.
- Cross-checking findings against an email header analyzeronce you have identified a suspect sender domain in headers, inspect the hostname here to surface the exact label that carries the homograph risk.
When it's not enough
- Treating xn-- as a phishing verdictmany legitimate country-TLD and brand domains use Punycode. Presence of xn-- alone is not a risk signal.
- Assuming single-script non-Latin is safea label written entirely in Cyrillic can still be a visual impersonation of an ASCII brand (whole-script confusable). This is why the tool also computes a confusable skeleton.
- Reducing homograph detection to mixed-script onlymixed-script is one strong signal. Whole-script confusables and ASCII-only lookalikes require separate checks and are scored separately here.
- Treating the risk tier as a verdictthe risk summary surfaces evidence and weights signals. Attribution requires threat intel, WHOIS history, passive DNS, or direct verification through the legitimate brand.
How to use it
- 1
Paste a URL or hostname
Paste a full URL, a scheme-less URL, or a bare hostname. The tool strips path/query/fragment and normalises non-standard dot separators (like ideographic full stop) before analysis.
- 2
Let the tool extract and normalise the hostname
Input handling uses the browser's URL parser when possible and falls back to manual extraction. The lowercased hostname is the unit of comparison.
- 3
Review Unicode and Punycode forms side by side
The conversion panel shows the human-readable Unicode form and the ASCII / Punycode transport form. Use the swap control to toggle the input between the two.
- 4
Inspect the script and confusable warnings
Open the Labels tab for a per-label breakdown and the Confusables tab for a character-level table with Unicode code points and script names. Flagged glyphs are highlighted in the label display.
- 5
Read the tiered risk summary
Risk is tiered as Low concern, Needs review, or High risk, and is derived from a combination of mixed-script, whole-script, ASCII-lookalike, and brand-skeleton signals. The summary sentence explains which signals fired.
- 6
Bulk-scan or copy the analyst note
Copy the condensed analyst note for a ticket or SOC entry, or switch to bulk mode to scan many hostnames at once and export a risk-ranked CSV.
Common errors and fixes
Assuming every xn-- domain is malicious
Punycode is the standard DNS transport for internationalised labels. Legitimate brands and country TLDs depend on it. Presence of xn-- alone is not a risk signal — review the decoded Unicode form, the script composition, and the confusable skeleton.
Assuming all Unicode domains are unsafe
Single-script non-Latin IDNs are common and usually legitimate. Browsers display many of them in Unicode form by default. Evaluate the composition of scripts and whether confusable glyphs appear, not whether non-ASCII is present.
Relying only on mixed-script detection
A label written entirely in Cyrillic can still impersonate an ASCII brand — that is a whole-script confusable, not a mixed-script one. This tool folds glyphs to a visual skeleton so whole-script and mixed-script cases both get scored.
Missing ASCII-only lookalikes like paypa1.com
Not every phishing domain uses Unicode. Digit/letter substitutions (1↔l, 0↔o, |↔l) and character-pair tricks (rn↔m, vv↔w, cl↔d) are common ASCII-only patterns. The risk model includes these separately from Unicode checks.
Inspecting the whole URL instead of isolating the hostname
Path and query components can contain arbitrary text that is not part of the DNS name. The tool strips those automatically and operates on the hostname — the thing that actually identifies the site to DNS and TLS.
Expecting the tool to prove ownership or reputation
This is a local forensic assistant. It does not perform DNS lookups, WHOIS checks, TLS certificate fetches, or reputation queries. Use domain reputation services and manual verification for attribution.
Frequently asked questions
Related
What is an xn-- domain and is it safe?
xn-- is the ACE (ASCII-Compatible Encoding) prefix used by Punycode. Any DNS label that contains non-ASCII characters is transported on the wire as xn---prefixed ASCII, because DNS only allows a narrow character set. Your browser and mail client decode these labels back to Unicode for display when their spoofing heuristics consider it safe.
Most xn-- domains are legitimate internationalised domain names — for example accented European names, Arabic brands, Chinese trademarks, or country-TLD sites. Some are abused in homograph phishing. The presence of xn-- alone is not enough to judge intent. You need to look at which scripts the label uses, whether it mixes scripts, whether its visual skeleton matches a trusted brand, and whether it is paired with suspicious context (the email it came in, its TLD, the link text it hid behind).
What is the difference between Punycode, IDN, and a homograph attack?
IDN (Internationalised Domain Name): a domain name that contains one or more labels with non-ASCII characters. IDNs allow domain names in languages and scripts outside basic Latin — French accents, Cyrillic, Arabic, Chinese, Japanese, and many others.
Punycode: the ASCII-compatible encoding defined in RFC 3492 that represents IDN labels in DNS-safe ASCII. A label like münchen becomes xn--mnchen-3ya. Punycode is a mechanical, reversible encoding — it adds no semantic meaning.
Homograph attack: using visually confusable characters from different scripts (or ASCII-only digit/letter substitutions) to create a hostname that looks like a trusted brand but resolves to a different domain. Homograph attacks can use IDNs and Punycode, but they can also be pure ASCII (for example paypa1.com).
Why mixed-script detection is useful but not enough
Mixed-script detection flags labels that combine Latin with another script within a single label — for example a label where most characters are ASCII but one is a Cyrillic lookalike. This is a strong signal: browsers' own anti-spoofing heuristics treat mixed-script labels as suspicious and often force them back to Punycode display.
But mixed-script detection misses two important cases. First, whole-script confusables — a label written entirely in Cyrillic that nonetheless spells out a Latin brand when you read the glyphs visually. Second, ASCII-only lookalikes — domains that use no Unicode at all, relying on digit-letter substitutions or character-pair tricks.
A useful risk model scores all three: mixed-script, whole-script (via a confusable skeleton), and ASCII lookalike patterns. No single signal is sufficient on its own, and every signal can appear in legitimate contexts — so the result should be tiered, not binary.
Phishing X-ray table
| Visual domain | Hidden / normalised form | Technique | Risk note |
|---|---|---|---|
| münchen.de | xn--mnchen-3ya.de | Legitimate IDN (Punycode-based) | Normal internationalised domain. Punycode here is transport, not trickery. |
| аpple.com | xn--pple-43d.com | Mixed-script: Cyrillic 'а' + Latin 'pple' | Classic homograph brand spoof. Browsers typically force Punycode display here. |
| рауроӏ.com | xn--80ahpgc.com (example) | Whole-script Cyrillic imitating "paypal" (Punycode-based) | Single-script — mixed-script detection misses it. Skeleton folds to "paypal". |
| paypa1.com | paypa1.com (pure ASCII) | ASCII lookalike (digit-letter swap, not Punycode) | No Unicode involved. Punycode-only detectors miss this entirely. |
| rnicrosoft.com | rnicrosoft.com (pure ASCII) | ASCII character-pair trick (rn → m) | At small font sizes "rn" is indistinguishable from "m". Skeleton folds match "microsoft". |
| اتصالات.ae | xn--mgbaakc7dvf.ae | Legitimate single-script Arabic IDN | Non-Latin is not a risk signal on its own — assess confusables and context. |
Highlighted spans mark the deceptive characters or character pairs. Punycode-based and ASCII-only categories are kept strictly separate — one does not imply the other.
How browsers and apps handle IDNs
Browsers apply their own anti-spoofing heuristics when they decide whether to render a URL in Unicode or force it back to its Punycode form. The rules vary between browsers and versions, but the general pattern is: show Unicode when the label uses scripts that are either associated with the user's locale or are confined to a single script, and force Punycode display when multiple scripts mix within one label.
This helps end users but it does not help automation. Mail filters, log analyzers, SIEMs, and URL-parsing APIs usually expose hostnames in their ASCII / Punycode form. During investigation you want both: the human-readable Unicode form to see what the victim would have seen, and the ASCII form that DNS, TLS, and HTTP actually resolved.
This tool deliberately presents both forms side by side for every input, and applies its own multi-signal risk model on top — because the display heuristics in any single browser are not a substitute for a deliberate forensic check.
Developer snippet: normalising a hostname in modern JavaScript
Modern URL parsers in browsers and Node.js expose hostnames in their ASCII / Punycode form by default. If you are comparing hostnames programmatically, always compare the ASCII form — doing so avoids the ambiguity of visual Unicode equality.
// Browser + Node 20+
const u = new URL("https://münchen.de/tourism");
u.hostname;
// → "xn--mnchen-3ya.de" (ASCII / Punycode form)
// Compare hostnames safely:
function sameHost(a, b) {
return new URL(a).hostname.toLowerCase()
=== new URL(b).hostname.toLowerCase();
}
sameHost(
"https://münchen.de/x",
"https://xn--mnchen-3ya.de/y"
); // → trueNote: the browser's URL parser converts Unicode hostnames to ASCII automatically, but it does not decode Punycode back to Unicode for you. When you need the Unicode form for display, use a Punycode decoder — which is what this tool does locally via an RFC 3492 implementation.
Local, private, and zero-upload in v1
Conversion, script detection, confusable analysis, and bulk scanning all run in your browser. No hostname is sent to a server. The tool does not perform DNS lookups, WHOIS queries, TLS certificate fetches, or HTTP requests against the hostnames you paste — which makes it safe for triaging suspicious domains without contacting them.