Punycode is an ASCII-compatible encoding defined in RFC 3492. It represents a Unicode string as a sequence of ASCII characters that DNS can carry. Punycode is a mechanical, reversible encoding — it is neither a security feature nor a risk signal by itself.

Is every Punycode domain suspicious?

No. Most Punycode domains are legitimate internationalised domain names. Treating xn-- as a phishing signal on its own causes false positives on legitimate brand, country-TLD, and multilingual sites. Risk should be assessed from the combination of scripts used, confusable glyphs, whole-script spoofs, ASCII lookalike patterns, and context.

What is a homograph attack?

A homograph attack uses visually confusable characters to mimic a trusted domain. The characters can come from a different script (for example Cyrillic 'а' replacing Latin 'a') or from ASCII itself (for example '1' replacing 'l'). The goal is that the victim reads the URL as the trusted brand and clicks it.

What is the difference between a mixed-script spoof and an ASCII lookalike?

A mixed-script spoof mixes Latin with another script inside a single label (for example аpple.com where 'а' is Cyrillic). A whole-script spoof uses a single non-Latin script that happens to spell a Latin brand visually. An ASCII lookalike uses only ASCII characters but relies on digit-letter substitutions (paypa1.com) or character-pair tricks (rnicrosoft.com). All three are separate techniques — a mixed-script-only detector misses the last two.

Why do browsers sometimes show Unicode and sometimes show Punycode?

Browsers apply anti-spoofing heuristics when choosing how to display a hostname. The rules differ between browsers and versions, but in general they show Unicode when the label is a single script (often tied to the user's locale) and force Punycode display when multiple scripts mix within one label. This reduces end-user exposure but is not a substitute for a deliberate forensic inspection.

Can a domain be risky even if it does not start with xn--?

Yes. ASCII-only lookalikes such as paypa1.com, rnicrosoft.com, or g00gle.com use no Unicode at all. They rely on digit-letter substitutions, character-pair tricks, or hyphen-inserted brand names. This tool scores those patterns separately from Unicode-based homograph risk.

Does this tool contact the suspicious domain?

No. In v1 the tool performs no DNS lookup, WHOIS query, TLS certificate fetch, or HTTP request against the inspected hostname. Analysis is purely local: Punycode conversion, script detection, and confusable checks against a curated map. This makes it safe for triaging suspicious domains without contacting them.

Can legitimate international websites use Punycode safely?

Yes. Punycode is the standard way to transport internationalised domain names in DNS. Many legitimate brands, government sites, and country-TLD domains use Punycode for their canonical names. The correct question is not 'does this use Punycode?' but 'what scripts does the label actually contain, and does the visual form impersonate another trusted name?'.

Why is single-script Unicode not automatically safe?

Whole-script confusables can spell a Latin brand using only Cyrillic, Greek, or another single non-Latin script. Because there is no script mixing inside the label, mixed-script detectors skip them. The tool therefore folds each label to a visual 'confusable skeleton' and checks it against a curated list of well-known brands, flagging matches as a high-risk signal.

All tools

Punycode Converter & Homograph Inspector

Q: What is an xn-- domain?

xn-- is the prefix Punycode uses to transport non-ASCII domain labels in DNS. Any internationalised domain label (for example 'münchen') is encoded as an xn---prefixed ASCII string ('xn--mnchen-3ya') so it can travel through DNS. Browsers and mail clients decode it back to Unicode for display when their spoofing heuristics consider it safe.

Decode xn-- domains, convert Unicode hostnames safely, and inspect suspicious domains for mixed scripts, confusable characters, and homograph phishing risk — locally in your browser.

Samples

Hostname or URL

Analysis is local. No DNS lookup and no network request is made to the inspected hostname.

Paste a URL or hostname on the left, or load a sample above, to convert between Unicode and Punycode, inspect scripts, and flag confusable or lookalike characters.

Conversion and homograph analysis run entirely in your browser. No hostnames are sent to a server, and no DNS lookup or network request to the inspected host is performed in v1. This tool surfaces forensic evidence — it does not prove malicious intent.

Conversion and homograph analysis run entirely in your browser. No hostnames, URLs, or log entries are uploaded to a server. The tool does not perform DNS lookups or contact the inspected host in v1, which makes it safe for triaging suspicious domains without touching them. Findings are forensic signals, not verdicts on malicious intent — confirm attribution through a trusted channel.

Overview

Punycode exists because DNS was defined before Unicode. The DNS wire protocol allows only a narrow ASCII character set, so internationalised domain names (IDNs) are transported as ASCII-compatible encodings beginning with the prefix xn--. Your browser then decodes the label back into Unicode when it is safe to display. Punycode is a neutral transport mechanism; it is neither a phishing signal nor a safety guarantee on its own.

The same mechanism that enables legitimate IDNs (for example münchen.de transported as xn--mnchen-3ya.de) also enables homograph attacks, where characters from a non-Latin script are chosen to visually imitate a well-known ASCII brand. Reducing this to "xn-- is suspicious" or "all Unicode is unsafe" gets you false positives on legitimate sites and false negatives on ASCII-only tricks like paypa1.com.

This tool combines bi-directional IDN conversion with multi-signal homograph analysis: per-label script detection, mixed-script detection, a curated confusable-glyph map, a confusable-skeleton check for whole-script spoofs, and ASCII-only lookalike patterns such as digit-letter swaps and character-pair tricks. The result is a tiered risk view intended to help phishing triage, domain review, and legitimate IDN work in a single forensic assistant.

Use cases

When to use it

Phishing triage on a suspicious URLextract the hostname from a link in an email or support ticket and inspect it for mixed-script, whole-script, or ASCII lookalike homograph risk without clicking it.
Decoding xn-- domains from logspaste raw DNS, proxy, or SIEM hostnames that appear in their Punycode form and read them back in their original Unicode spelling.
Legitimate IDN development workconvert a brand's IDN to its Punycode transport form for DNS, mail, and TLS certificate configuration, and confirm labels encode correctly per RFC 3492.
Bulk review of suspicious domain listspaste dozens or hundreds of hostnames from a log export, header dump, or ticket to get a risk-ranked table with CSV export.
Security awareness training and documentationuse the per-character confusable table with code points and scripts to illustrate real homograph techniques clearly.
Cross-checking findings against an email header analyzeronce you have identified a suspect sender domain in headers, inspect the hostname here to surface the exact label that carries the homograph risk.

When it's not enough

Treating xn-- as a phishing verdictmany legitimate country-TLD and brand domains use Punycode. Presence of xn-- alone is not a risk signal.
Assuming single-script non-Latin is safea label written entirely in Cyrillic can still be a visual impersonation of an ASCII brand (whole-script confusable). This is why the tool also computes a confusable skeleton.
Reducing homograph detection to mixed-script onlymixed-script is one strong signal. Whole-script confusables and ASCII-only lookalikes require separate checks and are scored separately here.
Treating the risk tier as a verdictthe risk summary surfaces evidence and weights signals. Attribution requires threat intel, WHOIS history, passive DNS, or direct verification through the legitimate brand.

How to use it

1
Paste a URL or hostname
Paste a full URL, a scheme-less URL, or a bare hostname. The tool strips path/query/fragment and normalises non-standard dot separators (like ideographic full stop) before analysis.
2
Let the tool extract and normalise the hostname
Input handling uses the browser's URL parser when possible and falls back to manual extraction. The lowercased hostname is the unit of comparison.
3
Review Unicode and Punycode forms side by side
The conversion panel shows the human-readable Unicode form and the ASCII / Punycode transport form. Use the swap control to toggle the input between the two.
4
Inspect the script and confusable warnings
Open the Labels tab for a per-label breakdown and the Confusables tab for a character-level table with Unicode code points and script names. Flagged glyphs are highlighted in the label display.
5
Read the tiered risk summary
Risk is tiered as Low concern, Needs review, or High risk, and is derived from a combination of mixed-script, whole-script, ASCII-lookalike, and brand-skeleton signals. The summary sentence explains which signals fired.
6
Bulk-scan or copy the analyst note
Copy the condensed analyst note for a ticket or SOC entry, or switch to bulk mode to scan many hostnames at once and export a risk-ranked CSV.

Common errors and fixes

Assuming every xn-- domain is malicious

Punycode is the standard DNS transport for internationalised labels. Legitimate brands and country TLDs depend on it. Presence of xn-- alone is not a risk signal — review the decoded Unicode form, the script composition, and the confusable skeleton.

Assuming all Unicode domains are unsafe

Single-script non-Latin IDNs are common and usually legitimate. Browsers display many of them in Unicode form by default. Evaluate the composition of scripts and whether confusable glyphs appear, not whether non-ASCII is present.

Relying only on mixed-script detection

A label written entirely in Cyrillic can still impersonate an ASCII brand — that is a whole-script confusable, not a mixed-script one. This tool folds glyphs to a visual skeleton so whole-script and mixed-script cases both get scored.

Missing ASCII-only lookalikes like paypa1.com

Not every phishing domain uses Unicode. Digit/letter substitutions (1↔l, 0↔o, |↔l) and character-pair tricks (rn↔m, vv↔w, cl↔d) are common ASCII-only patterns. The risk model includes these separately from Unicode checks.

Inspecting the whole URL instead of isolating the hostname

Path and query components can contain arbitrary text that is not part of the DNS name. The tool strips those automatically and operates on the hostname — the thing that actually identifies the site to DNS and TLS.

Expecting the tool to prove ownership or reputation

This is a local forensic assistant. It does not perform DNS lookups, WHOIS checks, TLS certificate fetches, or reputation queries. Use domain reputation services and manual verification for attribution.

Frequently asked questions

Secure Email Header Analyzer & Hop Tracer URL Parser, Encoder & UTM Query Builder DMARC / SPF / DKIM Analyzer & Report Parser URL Indexability Inspector Secure QR Code Decoder & Inspector Invisible Character Detector & Unicode Normalizer All developer tools

What is an xn-- domain and is it safe?

xn-- is the ACE (ASCII-Compatible Encoding) prefix used by Punycode. Any DNS label that contains non-ASCII characters is transported on the wire as xn---prefixed ASCII, because DNS only allows a narrow character set. Your browser and mail client decode these labels back to Unicode for display when their spoofing heuristics consider it safe.

Most xn-- domains are legitimate internationalised domain names — for example accented European names, Arabic brands, Chinese trademarks, or country-TLD sites. Some are abused in homograph phishing. The presence of xn-- alone is not enough to judge intent. You need to look at which scripts the label uses, whether it mixes scripts, whether its visual skeleton matches a trusted brand, and whether it is paired with suspicious context (the email it came in, its TLD, the link text it hid behind).

What is the difference between Punycode, IDN, and a homograph attack?

IDN (Internationalised Domain Name): a domain name that contains one or more labels with non-ASCII characters. IDNs allow domain names in languages and scripts outside basic Latin — French accents, Cyrillic, Arabic, Chinese, Japanese, and many others.

Punycode: the ASCII-compatible encoding defined in RFC 3492 that represents IDN labels in DNS-safe ASCII. A label like münchen becomes xn--mnchen-3ya. Punycode is a mechanical, reversible encoding — it adds no semantic meaning.

Homograph attack: using visually confusable characters from different scripts (or ASCII-only digit/letter substitutions) to create a hostname that looks like a trusted brand but resolves to a different domain. Homograph attacks can use IDNs and Punycode, but they can also be pure ASCII (for example paypa1.com).

Why mixed-script detection is useful but not enough

Mixed-script detection flags labels that combine Latin with another script within a single label — for example a label where most characters are ASCII but one is a Cyrillic lookalike. This is a strong signal: browsers' own anti-spoofing heuristics treat mixed-script labels as suspicious and often force them back to Punycode display.

But mixed-script detection misses two important cases. First, whole-script confusables — a label written entirely in Cyrillic that nonetheless spells out a Latin brand when you read the glyphs visually. Second, ASCII-only lookalikes — domains that use no Unicode at all, relying on digit-letter substitutions or character-pair tricks.

A useful risk model scores all three: mixed-script, whole-script (via a confusable skeleton), and ASCII lookalike patterns. No single signal is sufficient on its own, and every signal can appear in legitimate contexts — so the result should be tiered, not binary.

Phishing X-ray table

Visual domain	Hidden / normalised form	Technique	Risk note
münchen.de	xn--mnchen-3ya.de	Legitimate IDN (Punycode-based)	Normal internationalised domain. Punycode here is transport, not trickery.
аpple.com	xn--pple-43d.com	Mixed-script: Cyrillic 'а' + Latin 'pple'	Classic homograph brand spoof. Browsers typically force Punycode display here.
рауроӏ.com	xn--80ahpgc.com (example)	Whole-script Cyrillic imitating "paypal" (Punycode-based)	Single-script — mixed-script detection misses it. Skeleton folds to "paypal".
paypa1.com	paypa1.com (pure ASCII)	ASCII lookalike (digit-letter swap, not Punycode)	No Unicode involved. Punycode-only detectors miss this entirely.
rnicrosoft.com	rnicrosoft.com (pure ASCII)	ASCII character-pair trick (rn → m)	At small font sizes "rn" is indistinguishable from "m". Skeleton folds match "microsoft".
اتصالات.ae	xn--mgbaakc7dvf.ae	Legitimate single-script Arabic IDN	Non-Latin is not a risk signal on its own — assess confusables and context.

Highlighted spans mark the deceptive characters or character pairs. Punycode-based and ASCII-only categories are kept strictly separate — one does not imply the other.

How browsers and apps handle IDNs

Browsers apply their own anti-spoofing heuristics when they decide whether to render a URL in Unicode or force it back to its Punycode form. The rules vary between browsers and versions, but the general pattern is: show Unicode when the label uses scripts that are either associated with the user's locale or are confined to a single script, and force Punycode display when multiple scripts mix within one label.

This helps end users but it does not help automation. Mail filters, log analyzers, SIEMs, and URL-parsing APIs usually expose hostnames in their ASCII / Punycode form. During investigation you want both: the human-readable Unicode form to see what the victim would have seen, and the ASCII form that DNS, TLS, and HTTP actually resolved.

This tool deliberately presents both forms side by side for every input, and applies its own multi-signal risk model on top — because the display heuristics in any single browser are not a substitute for a deliberate forensic check.

Developer snippet: normalising a hostname in modern JavaScript

Modern URL parsers in browsers and Node.js expose hostnames in their ASCII / Punycode form by default. If you are comparing hostnames programmatically, always compare the ASCII form — doing so avoids the ambiguity of visual Unicode equality.

// Browser + Node 20+
const u = new URL("https://münchen.de/tourism");

u.hostname;
// → "xn--mnchen-3ya.de"  (ASCII / Punycode form)

// Compare hostnames safely:
function sameHost(a, b) {
  return new URL(a).hostname.toLowerCase()
       === new URL(b).hostname.toLowerCase();
}

sameHost(
  "https://münchen.de/x",
  "https://xn--mnchen-3ya.de/y"
); // → true

Note: the browser's URL parser converts Unicode hostnames to ASCII automatically, but it does not decode Punycode back to Unicode for you. When you need the Unicode form for display, use a Punycode decoder — which is what this tool does locally via an RFC 3492 implementation.

Local, private, and zero-upload in v1

Conversion, script detection, confusable analysis, and bulk scanning all run in your browser. No hostname is sent to a server. The tool does not perform DNS lookups, WHOIS queries, TLS certificate fetches, or HTTP requests against the hostnames you paste — which makes it safe for triaging suspicious domains without contacting them.