Spec-Zone .ru
спецификации, руководства, описания, API

10.1.10.5. The utf8 Character Set (3-Byte UTF-8 UnicodeEncoding)

UTF-8 (Unicode Transformation Format with 8-bit units) is an alternative way to store Unicode data. It is implemented according to RFC 3629, which describes encoding sequences that take from one to four bytes. (An older standard for UTF-8 encoding, RFC 2279, describes UTF-8 sequences that take from one to six bytes. RFC 3629 renders RFC 2279 obsolete; for this reason, sequences with five and six bytes are no longer used.)

The idea of UTF-8 is that various Unicode characters are encoded using byte sequences of different lengths:

The utf8 character set is the same in MySQL 5.7 as before 5.7 and has exactly the same characteristics:

Exactly the same set of characters is available in utf8 as in ucs2. That is, they have the same repertoire.