What is Unicode?

Unicode is a universal character standard that aims to give every character in every writing system — plus symbols and emoji — a unique number called a code point (e.g. U+0041 for "A"). It solves the chaos of incompatible legacy encodings.

Unicode is the character set; encodings like UTF-8 define how those code points are stored as bytes. UTF-8 is the dominant encoding on the web.

Key points

  • Assigns a unique code point to every character (e.g. U+1F600 😀).
  • Covers all writing systems, symbols, and emoji.
  • Unicode is the character set; UTF-8 is how it is stored as bytes.
  • Its first 128 code points are identical to ASCII.

Example

A = U+0041    é = U+00E9    😀 = U+1F600

Common uses

  • Supporting international text and emoji
  • Consistent text across systems and languages
  • Web pages and APIs (UTF-8)
  • File and database encoding

More terms