什么是字符集
字符(Character)是各种文字和符号的总称,包括各国家文字、标点符号、图形符号、数字等。 字符集(Character set)是多个字符的集合,字符集种类较多,每个字符集包含的字符个数不同, 常见的字符集有ASCII、GB2312、GBK、 GB18030、Unicode等。计算机要准确的处理各种字符 集文字,就需要进行字符编码,以便计算机能够识别和存储各种文字。
字符集 | 描述 |
ASCII | 最简单的西文编码方案,主要用于显示现代英语和其他西欧语言。 使用1个字节表示,可表示128个字符。 |
GB2312 | 国家标准简体中文字符集,兼容ASCII。 使用2个字节表示,能表示7445个符号,包括6763个汉字,几乎覆盖所有高频率汉字。 |
GBK | GB2312的扩展,加入对繁体字的支持,兼容GB2312。 使用2个字节表示,可表示21886个字符。 |
GB18030 | 解决了中文、日文、朝鲜语等的编码,兼容GBK。 采用变字节表示(1 ASCII,2,4字节)。可表示27484个文字。 |
Unicode | Unicode是国际标准编码字符集,为世界650种语言进行统一编码,兼容ISO-8859-1。 Unicode字符集有多个编码方式,分别是UTF-8,UTF-16和UTF-32。 |
MySQL支持的字符集
mysql> show character set;
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| binary | Binary pseudo charset | binary | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.00 sec)
设置字符集
1、数据库
# 创建数据库时指定字符集
CREATE DATABASE databaseName CHARSET utf8 COLLATE utf8_general_ci;
# 查看数据库的字符集
SHOW CREATE DATABASE databaseName;
2、表
# 创建表时指定字符集
CREATE TABLE tableName(...) DEFAULT CHARSET=utf8;
# 查看数据库的字符集
SHOW CREATE TABLE tableName;
3、字段
CREATE TABLE tableName(..., name varchar(50) not null CHARSET utf8, ...);
GB2312装不下“屌丝”
GB2312的表
mysql> create table t_gb2312(name varchar(30)) default charset=gb2312;
Query OK, 0 rows affected (0.04 sec)
mysql> insert into t_gb2312 values('张三');
Query OK, 1 row affected (0.12 sec)
mysql> insert into t_gb2312 values('屌丝');
ERROR 1366 (HY000): Incorrect string value: '\xE5\xB1\x8C\xE4\xB8\x9D' for column 'name' at row 1
mysql>
GBK的表
mysql> create table t_gbk(name varchar(30)) default charset=gbk;
Query OK, 0 rows affected (0.04 sec)
mysql> insert into t_gbk values('屌丝');
Query OK, 1 row affected (0.01 sec)
mysql> select * from t_gbk;
+--------+
| name |
+--------+
| 屌丝 |
+--------+
1 row in set (0.00 sec)
mysql>