Loading...
X

“ERROR 1366 (22007): Incorrect string value” in MySQL/MariaDB (SOLVED)

How to insert emoji into MySQL/MariaDB database

On the one hand, inserting emoji into a MySQL / MariaDB database table does not require any special preparatory steps – just insert one or more emoji characters, which may contain other text. For example:

INSERT INTO TestTABLE (`test_column`) VALUES ("🎫⏭️✈️ and hi 🌅");

But when executing the previous request, you may encounter an error:

ERROR 1366 (22007): Incorrect string value: '\xF0\x9F\x8E\xAB\xE2\x8F...' for column `TestDB1`.`TestTABLE`.`test_column` at row 1

Why does the “ERROR 1366 (22007): Incorrect string value” occur

Emoji are characters that use four bytes of UTF-8 encoding. As a reminder, UTF-8 is a variable length encoding, and MySQL/MariaDB allow the utf8mb3 and utf8mb4 character sets, which use 3 and 4 bytes per code point, respectively.

Now utf8mb3 is considered rather an obsolete character set, and when creating databases as well as tables, it is recommended to use only the utf8mb4 character set.

See the article for details: What is the difference between utf8_general_ci, utf8_unicode_ci, utf8mb4_general_ci, utf8mb4_unicode_ci collations. Which collation, character set and encoding to choose for MySQL database

How to fix “ERROR 1366 (22007): Incorrect string value” error

  1. To store texts that contain emoji, you must not only select the UTF-8 encoding, but also select the utf8mb4 character set. This must be done for the database as a whole, as well as tables and columns.
  2. If the database uses utf8mb4, then even in this case it is necessary to specify the correct character set in the MySQL / MariaDB server connection properties.

All these issues are discussed in more detail below.

How to check that database and table use utf8mb4 character set

To check which character set a database is using, run a query like this (replace _TestDB_ with the name of the database):

SHOW CREATE DATABASE _TestDB_;

Sample output:

CREATE DATABASE `TestDB` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */

As you can see, the character set is set to utf8mb4 and the collation is set to utf8mb4_unicode_ci. This is good. But, in fact, tables can use their own character set. Moreover, even individual columns of the same table can have their own values for the character set used.

In my practice, I have encountered the fact that the database was created with the utf8 character set and utf8_unicode_ci collation, but this did not prevent tables from being imported with the utf8mb4 character set and utf8mb4_unicode_ci collation. How then does the choice of database and table character sets affect if these values can be overridden for individual columns? This is not entirely clear to me, perhaps this data is used as default values when creating new tables.

To check which character set a table is using, run a query like this (replace _TestTABLE_ with the table name:):

SHOW CREATE TABLE _TestTABLE_;

Sample output:

CREATE TABLE `TestTABLE` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `test_column` text NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

The default character set is utf8mb4. This means the following:

  • columns should also use the utf8mb4 character set if this has not changed since the table was created
  • such a table can store emoji

How to change the character set of a table and table fields to utf8mb4

If the character set is not utf8mb4 (for example, utf8 or something else is used), then you can first set the character set to utf8mb4 for the database as a whole, this is done by a query like:

ALTER DATABASE _TestDB_ CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

Although, as mentioned above, this does not particularly affect the contents of already created tables.

To change the default character set of a table, as well as the used character set of all text columns (CHAR, VARCHAR, TEXT) to a new character set, use an expression like this (replace _TestTABLE_ with the table name):

ALTER TABLE _TestTABLE_ CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

This statement also changes the collation of all character columns.

So, changing the character set of individual columns to utf8mb4 is not required. But you can do it, including to avoid automatic conversion of the column data type.

For a column that has a data type of VARCHAR or one of the TEXT types, CONVERT TO CHARACTER SET changes the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For a latin1 TEXT column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted to utf8mb4, each character might require up to 4 bytes, for a maximum possible length of 4 × 65,535 = 262,140 bytes. That length does not fit in a TEXT column's length bytes, so MySQL converts the data type to MEDIUMTEXT, which is the smallest string type for which the length bytes can record a value of 262,140. Similarly, a VARCHAR column might be converted to MEDIUMTEXT.

To avoid data type changes of the type just described, do not use CONVERT TO CHARACTER SET. Instead, use MODIFY to change individual columns. For example:

ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8mb4;

ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8mb4;

How to create a table using utf8mb4 character set

To make a table and its character columns use utf8mb4, add the following line to the table creation query:

DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

For example:

CREATE TABLE IF NOT EXISTS `TestDB`.`test_table` (`id` int(11) NOT NULL AUTO_INCREMENT, `test_column` text COLLATE utf8mb4_unicode_ci NOT NULL, PRIMARY KEY(id)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

The database and table are using utf8mb4, but the error “ERROR 1366 (22007): Incorrect string value” still occurs

If everything is in order with the database and the data stored in them, then before executing queries that include emoji, run the following query:

SET NAMES utf8mb4;

This statement sets three session system variables: character_set_client, character_set_connection, and character_set_results to the specified character set.

As you can see, the error is now gone.

We can verify that the new data is indeed inserted into the table:

SELECT * FROM TestTABLE;

Note: emoji in the console may not be displayed correctly – apparently, this is due to the peculiarities of the terminal, or the lack of necessary fonts. However, despite the fact that in the screenshot, the emoji seem to crawl on top of each other, and some are simply not displayed correctly, in a web browser or text editor they look correct.

How to set the default character set in settings

You can specify the default character set in the command line with the --default-character-set option. “Default character set” means that the specified encoding will be used unless it has been overridden by other queries.

An example of commands for connecting to a MySQL and MariaDB server, in which case the connection will use the utf8mb4 character set:

mariadb -u root --default-character-set=utf8mb4
mysql -u root --default-character-set=utf8mb4

If you want to always use the utf8mb4 character set, but do not want to always query for it or specify it in the command line, then you can choose the default character set in the configuration file.

To do this, open the /etc/my.cnf file:

sudo gedit /etc/my.cnf

And add the following lines there:

[client]
default-character-set=utf8mb4

See also: How to determine the location and name of the MySQL (MariaDB) configuration file. How to find the group name for MySQL and MariaDB configuration files

You can verify that this setting actually has an effect with the following command:

mariadb --help | grep -i ^default-character-set

The result will be the value of the default-character-set environment variable:

default-character-set             utf8mb4

How to know which character is causing “ERROR 1366 (22007): Incorrect string value”

Perhaps you do not want to change anything in the databases or in the connection to the DBMS, then, as an alternative, you can remove the problem character from the text.

Consider the error:

ERROR 1366 (22007): Incorrect string value: '\xF0\x9F\x8E\xAB\xE2\x8F...' for column `TestDB1`.`TestTABLE`.`test_column` at row 1

The part of the string with values starting with “\x” is the hexadecimal representation of an emoji (or other 4-byte character). The fact that the bytes are more than four indicates that there are several emoji characters. To see the first character, copy the first four bytes and run a command like this:

printf '_4 BYTES HERE_'

For example:

printf '\xF0\x9F\x8E\xAB'
🎫


Leave Your Observation

Your email address will not be published. Required fields are marked *