What is the difference between utf8_general_ci, utf8_unicode_ci, utf8mb4_general_ci, utf8mb4_unicode_ci collations. Which collation, character set and encoding to choose for MySQL database

As of MySQL 5.5.3 you must use utf8mb4 and not utf8. Both of these groups refer to UTF-8 encoding, but the older utf8 has MySQL-specific restrictions that prevent characters above 0xFFFD from being used.

Thus, neither utf8_general_ci nor utf8_unicode_ci need to be used anymore.

As for the new encoding versions utf8mb4_general_ci and utf8mb4_unicode_ci. That is unicode preferred over general. The utf8mb4_general_ci variant will be slightly faster in sorting (now this is not relevant), but has sorting issues in certain languages. The utf8mb4_unicode_ci encoding does not have these shortcomings.

So, the current recommended encoding for MySQL databases and tables is utf8mb4_unicode_ci.

Tip: To save space with utf8mb4, use VARCHAR instead of CHAR. Otherwise, MySQL will reserve four bytes for each character in a CHAR CHARACTER SET utf8mb4 column, as this is the maximum length possible. For example, MySQL must reserve 40 bytes for a CHAR(10) CHARACTER SET utf8mb4 column.

Note: more precisely, utf8mb4_unicode_ci is not exactly an encoding, in MySQL terms it is called “Collation” and includes a character set, as well as comparison and sorting rules. That is, utf8mb4_unicode_ci is a COLLATION, and utf8mb4 is a character set, and UTF-8 is already a variable length encoding.

Leave Your Observation

Your email address will not be published. Required fields are marked *