MySQL8支持的字符集

字符集定义了字符数据的存储方式。不同的字符集,编码规则、长度都不一样。

使用show charset命令,可以查看数据库支持的所有字符集。

mysql> show charset;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| binary   | Binary pseudo charset           | binary              |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_0900_ai_ci  |      4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.00 sec)

Charset:字符集名称;

Descript:字符集说明;

Default collation:默认校验规则。主要用来字符比较大小和排序;

Maxlen:占用的最大存储长度。单位:字节(B);

MySQL8常用的字符集

MySQL8支持40多种字符集。常用的字符集有ascii、gb2312、gbk、utf8、utf8mb4等。

从MySQL8开始,默认采用utf8mb4字符集。

字符集最大长度(B)支持字符
ascii1空格、标点符号、数字、大小写字母、不可见字符等
gb23122支持中英文混合场景,但是不是国际通用字符集
gbk2支持中英文混合场景,但是不是国际通用字符集
utf83支持中英文混合场景,是国际通用字符集
utf8mb44支持中英文混合场景,是国际通用字符集

ascii:只收录了128个字符,不支持中文;

gb2312:简体中文字符集的中国国家标准,收录了6763个汉字、拉丁字母、希腊字母、日文、俄文等;

gbk:对gb2312字符集的扩展,完全兼容gb2312字符集。收录了21003个汉字。

utf8:一种阉割后的utf字符集(为了节省空间)。支持中文,但对于一些生僻汉字、emoji表情无法支持;

utf8mb4:比utf8多一个字节,支持中文,甚至是生僻字、emoji表情。

查看和设置字符集

查看数据库字符集相关的设置

mysql> show variables like 'character_set%';
+--------------------------+---------------------------------------------------------+
| Variable_name            | Value                                                   |
+--------------------------+---------------------------------------------------------+
| character_set_client     | gbk                                                     |
| character_set_connection | gbk                                                     |
| character_set_database   | utf8mb4                                                 |
| character_set_filesystem | binary                                                  |
| character_set_results    | gbk                                                     |
| character_set_server     | utf8mb4                                                 |
| character_set_system     | utf8mb3                                                 |
| character_sets_dir       | C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set, 1 warning (0.01 sec)

查看数据库的默认字符集

mysql> show create database school;
+----------+----------------------------------------------------------------------------------------------------+
| Database | Create Database                                                                                    |
+----------+----------------------------------------------------------------------------------------------------+
| school   | CREATE DATABASE `school` /*!40100 DEFAULT CHARACTER SET utf8 */ /*!80016 DEFAULT ENCRYPTION='N' */ |
+----------+----------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

修改数据库的默认字符集

mysql> alter database school default character set utf8mb4;
Query OK, 1 row affected (0.01 sec)

mysql> show create database school;
+----------+----------------------------------------------------------------------------------------------------------------------------------+
| Database | Create Database                                                                                                                  |
+----------+----------------------------------------------------------------------------------------------------------------------------------+
| school   | CREATE DATABASE `school` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci */ /*!80016 DEFAULT ENCRYPTION='N' */ |
+----------+----------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

查看数据表的默认字符集

方法一:

mysql> show create table school.student\G;
*************************** 1. row ***************************
       Table: student
Create Table: CREATE TABLE `student` (
  `student_id` varchar(50) NOT NULL COMMENT '学生编号',
  `student_name` varchar(100) NOT NULL DEFAULT '' COMMENT '学生姓名',
  `gender` varchar(10) NOT NULL DEFAULT '' COMMENT '性别',
  `birth_day` date NOT NULL COMMENT '生日',
  `age` int NOT NULL DEFAULT '0' COMMENT '年龄',
  `class_id` varchar(50) NOT NULL DEFAULT '' COMMENT '班级编号',
  `score` decimal(18,2) NOT NULL DEFAULT '0.00' COMMENT '数学成绩',
  `teacher_id` varchar(20) DEFAULT NULL COMMENT '老师编号',
  PRIMARY KEY (`student_id`)
) ENGINE=MyISAM DEFAULT CHARSET=gbk COMMENT='学生'
1 row in set (0.00 sec)

方法二:

mysql> show table status from school like 'student';
+---------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----------------
----+---------------------+------------+-------------------+----------+----------------+---------+
| Name    | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time
    | Update_time         | Check_time | Collation         | Checksum | Create_options | Comment |
+---------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----------------
----+---------------------+------------+-------------------+----------+----------------+---------+
| student | MyISAM |      10 | Dynamic    |  219 |             54 |       11932 | 281474976710655 |         5120 |         0 |              1 | 2022-10-05 23:20
:40 | 2022-10-05 23:20:40 | NULL       | gb2312_chinese_ci |     NULL |                | 学生    |
+---------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-----------------
----+---------------------+------------+-------------------+----------+----------------+---------+
1 row in set (0.00 sec)

修改数据表的默认字符集

mysql> alter table school.student character set 'gb2312';
Query OK, 219 rows affected (0.05 sec)
Records: 219  Duplicates: 0  Warnings: 0

mysql> show create table school.student\G;
*************************** 1. row ***************************
       Table: student
Create Table: CREATE TABLE `student` (
  `student_id` varchar(50) CHARACTER SET gbk NOT NULL COMMENT '学生编号',
  `student_name` varchar(100) CHARACTER SET gbk NOT NULL DEFAULT '' COMMENT '学生姓名',
  `gender` varchar(10) CHARACTER SET gbk NOT NULL DEFAULT '' COMMENT '性别',
  `birth_day` date NOT NULL COMMENT '生日',
  `age` int NOT NULL DEFAULT '0' COMMENT '年龄',
  `class_id` varchar(50) CHARACTER SET gbk NOT NULL DEFAULT '' COMMENT '班级编号',
  `score` decimal(18,2) NOT NULL DEFAULT '0.00' COMMENT '数学成绩',
  `teacher_id` varchar(20) CHARACTER SET gbk DEFAULT NULL COMMENT '老师编号',
  PRIMARY KEY (`student_id`)
) ENGINE=MyISAM DEFAULT CHARSET=gb2312 COMMENT='学生'
1 row in set (0.00 sec)

查看字段的字符集

mysql> show full columns from school.student;
+--------------+---------------+----------------+------+-----+---------+-------+---------------------------------+----------+
| Field        | Type          | Collation      | Null | Key | Default | Extra | Privileges                      | Comment  |
+--------------+---------------+----------------+------+-----+---------+-------+---------------------------------+----------+
| student_id   | varchar(50)   | gbk_chinese_ci | NO   | PRI | NULL    |       | select,insert,update,references | 学生编号 |
| student_name | varchar(100)  | gbk_chinese_ci | NO   |     |         |       | select,insert,update,references | 学生姓名 |
| gender       | varchar(10)   | gbk_chinese_ci | NO   |     |         |       | select,insert,update,references | 性别     |
| birth_day    | date          | NULL           | NO   |     | NULL    |       | select,insert,update,references | 生日     |
| age          | int           | NULL           | NO   |     | 0       |       | select,insert,update,references | 年龄     |
| class_id     | varchar(50)   | gbk_chinese_ci | NO   |     |         |       | select,insert,update,references | 班级编号 |
| score        | decimal(18,2) | NULL           | NO   |     | 0.00    |       | select,insert,update,references | 数学成绩 |
| teacher_id   | varchar(20)   | gbk_chinese_ci | YES  |     | NULL    |       | select,insert,update,references | 老师编号 |
+--------------+---------------+----------------+------+-----+---------+-------+---------------------------------+----------+
8 rows in set (0.00 sec)

如何选择字符集

  • 在能完全满足应用场景的情况下,尽量选择小(最大长度较短)的字符集。这样能够节省存储空间、磁盘I/O、网络传输字节数、提高系统性能;
  • 如果应用系统面向国外业务,需要处理不同国家的不同语言,建议选择使用utf8mb4;
  • 如果应用系统仅面向中文网络,则为了性能考虑,建议选择使用gbk;
picture loss