Posts

Showing posts from September, 2015

utf8_general_ci VS utf8_unicode_ci what should we use?

These two collations are both for the UTF-8 character encoding. The differences are in how text is sorted and compared. Note: in new versions of MySQL use utf8mb4 , rather than utf8 , which is the same UTF-8 data format with same performance but previously only accepted the first 65,536 Unicode characters. Accuracy utf8mb4_unicode_ci is based on the Unicode standard for sorting and comparison, which sorts accurately in a very wide range of languages. utf8mb4_general_ci fails to implement all of the Unicode sorting rules, which will result in undesirable sorting in some situations, such as when using particular languages or characters. Performance utf8mb4_general_ci is faster at comparisons and sorting, because it takes a bunch of performance-related shortcuts. On modern servers, this performance boost will be all but negligible. It was devised in a time when servers had a tiny fraction of the CPU performance of today's computers. utf8mb4_unicode_c