Case sensitive search in MySQL/MariaDB
February 12, 2024
Table of contents
1. MySQL/MariaDB search for strings not case sensitive
2. How to make search case sensitive in MySQL/MariaDB
3. Why do MySQL/MariaDB look for strings that are not case sensitive?
4. Why does adding “BINARY” make string searches case sensitive?
5. It is correct to put “BINARY” before the search string or before the column name
6. Optimizing case-sensitive search (binary search) in MySQL/MariaDB
MySQL/MariaDB search for strings not case sensitive
Searching strings in MySQL can give unexpected results. For example, let's try to find all lines that contain the word “test”:
SELECT * FROM `TestTABLE` WHERE `test_column` LIKE "%test%";
The following lines are received:
+----+----------------------------------------------------+ | id | test_column | +----+----------------------------------------------------+ | 6 | This is a test string | | 7 | AND THIS IS A TEST LINE WRITTEN IN CAPITAL LETTERS | | 8 | Test String Written In Camel Style | | 9 | iNVERTED tEST sTRING wRITTEN iN cAMEL sTYLE | +----+----------------------------------------------------+
That is, values were found that contained the strings “test”, “TEST”, “Test” and “tEST”.
How to make search case sensitive in MySQL/MariaDB
To make the search case sensitive, add the word “BINARY”:
SELECT * FROM `TestTABLE` WHERE `test_column` LIKE BINARY "%test%";
Now the result is exactly what we expect from a case-sensitive search:
In the previous example, we added the “BINARY” operator before the search string. Instead, the operator can be placed before the name of the column in which the search will be performed:
SELECT * FROM `TestTABLE` WHERE BINARY `test_column` LIKE "%test%";
The second option, when “BINARY” comes before the column name, is slightly worse in terms of performance.
Why do MySQL/MariaDB look for strings that are not case sensitive?
If you read the documentation (https://dev.mysql.com/doc/refman/8.3/en/case-sensitivity.html), it becomes clear that “this is not a bug, this is a feature.”
In data types called “nonbinary strings”, namely CHAR, VARCHAR, TEXT, the search uses a “collation” of the data being compared, that is, a comparison that is not based on an exact match. What “collation” is is a topic for a separate note. But the easiest way to understand this is as comparison/sorting/search rules. And in accordance with these rules, the case of letters is not taken into account.
In practical terms, this means that string searches are performed in a case-insensitive manner and, when certain conditions are combined, sometimes even in an accent-insensitive manner. That is, situations are possible when 'é' = 'e'.
For binary strings, which include BINARY, VARBINARY, BLOB, the search compares the numeric values of the bytes in the operands.
From here we can draw the first important conclusion: if you want table searches to always be case sensitive, then for rows you can use BINARY, VARBINARY, BLOB data types.
Just as TEXT has data types of different lengths (TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT), so BLOB has similar data types (TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB).
Why does adding “BINARY” make string searches case sensitive?
The answer to this question is given by the official documentation, it says: a comparison between a non-binary string and a binary string is treated as a comparison of binary strings.
That is, when adding “BINARY” in the query above:
SELECT * FROM `TestTABLE` WHERE `test_column` LIKE BINARY "%test%";
The search string “test” begins to be treated as a binary string. The result is a comparison of a binary string with non-binary data, and in this case the situation is treated as a comparison of binary strings. In short, the search becomes case sensitive.
It is correct to put “BINARY” before the search string or before the column name
As you might guess, if we put the word “BINARY” in front of the column name, then the data in the column begins to be treated as binary data, and if we put “BINARY” in front of a row, then the row begins to be treated as binary.
There is an opinion that if the string is binary, then this is better for performance. Although, if you think about it, in any case, all data begins to be considered binary and there should be no difference.
However, in my tests, an SQL query with “BINARY” in front of the search row was always slightly faster than an SQL query with “BINARY” in front of the column name.
Optimizing case-sensitive search (binary search) in MySQL/MariaDB
There is an opinion that case-sensitive search is more resource-intensive than regular search. My observations when working with tables with tens of thousands of records up to several Gigabytes in size do not confirm this, but perhaps this is true when working with very large amounts of data.
If in your conditions binary search is really much more resource-intensive, then you can use the following construction for an SQL query:
SELECT * FROM (SELECT * FROM `table` WHERE `column` = 'value') as firstresult WHERE BINARY `column` = 'value'
As a result of execution, at the first stage of the search, which is NOT case sensitive, an intermediate result will be collected. Then a binary search (case sensitive) will be performed, but not on the entire volume of data, but on the sample obtained in the first stage.
Related articles:
- ERROR 1044 (42000): Access denied for user 'mial'@'localhost' to database 'TestDB'. Can't create MySQL database (SOLVED) (100%)
- Error when using ‘USE’ statement in MySQL (Mariadb): “ERROR 1044 (42000): Access denied for user” (SOLVED) (100%)
- Why “mysql -h” doesn't show help. Error “option '-h' requires an argument” (SOLVED) (100%)
- Why is MySQL (MariaDB) asking for a password even when it is specified with the “-p” option (SOLVED) (100%)
- Error “ERROR 1143 (42000)”: command denied to user for column in table in MySQL (MariaDB) (SOLVED) (100%)
- Linux does not see Wi-Fi on Realtek chipset. “firmware: failed to load rt2870.bin” error (SOLVED) (RANDOM - 3.3%)