What is Unicode?
Unicode is a universal character standard that aims to give every character in every writing system — plus symbols and emoji — a unique number called a code point (e.g. U+0041 for "A"). It solves the chaos of incompatible legacy encodings.
Unicode is the character set; encodings like UTF-8 define how those code points are stored as bytes. UTF-8 is the dominant encoding on the web.
Key points
- Assigns a unique code point to every character (e.g. U+1F600 😀).
- Covers all writing systems, symbols, and emoji.
- Unicode is the character set; UTF-8 is how it is stored as bytes.
- Its first 128 code points are identical to ASCII.
Example
A = U+0041 é = U+00E9 😀 = U+1F600
Common uses
- Supporting international text and emoji
- Consistent text across systems and languages
- Web pages and APIs (UTF-8)
- File and database encoding