索引的Cardinality是什么意思? show index from tbl的结果中会有Cardinality这一项,一直不明白是什么意思,这个值对于索引的效率有什么影响吗? 解决方案 » 免费领取超大流量手机卡,每月29元包185G流量+100分钟通话, 中国电信官方发货 表的cardinality(可以翻译为“散列程度”),它表示某个索引对应的列包含多少个不同的值——如果cardinality大大少于数据的实际散列程度,那么索引就基本失效了。我们可以使用SHOW INDEX语句来查看索引的散列程度: 做了个实验,不过有些问题:create table tt(a int, b int, c int);create index ind_a on tt(a);create index ind_ab on tt(a, b);create index ind_abc on tt(a, b, c);insert into tt(a, b, c)values(1, 1, 1), (1, 1, 2),(1, 2, 1), (1, 2, 1),(1, 2, 2), (2, 1, 1),(2, 2, 1), (2, 2, 2);mysql> show index from tt;+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| tt | 1 | ind_a | 1 | a | A | NULL | NULL | NULL | YES | BTREE | | | tt | 1 | ind_ab | 1 | a | A | NULL | NULL | NULL | YES | BTREE | | | tt | 1 | ind_ab | 2 | b | A | NULL | NULL | NULL | YES | BTREE | | | tt | 1 | ind_abc | 1 | a | A | NULL | NULL | NULL | YES | BTREE | | | tt | 1 | ind_abc | 2 | b | A | NULL | NULL | NULL | YES | BTREE | | | tt | 1 | ind_abc | 3 | c | A | NULL | NULL | NULL | YES | BTREE | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+上边cardinality那一项为什么都是NULL? 现在好了,需要analyse一下才行analyze table tt;mysql> show index from tt;+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| tt | 1 | ind_a | 1 | a | A | 2 | NULL | NULL | YES | BTREE | | | tt | 1 | ind_ab | 1 | a | A | 2 | NULL | NULL | YES | BTREE | | | tt | 1 | ind_ab | 2 | b | A | 4 | NULL | NULL | YES | BTREE | | | tt | 1 | ind_abc | 1 | a | A | 2 | NULL | NULL | YES | BTREE | | | tt | 1 | ind_abc | 2 | b | A | 4 | NULL | NULL | YES | BTREE | | | tt | 1 | ind_abc | 3 | c | A | 8 | NULL | NULL | YES | BTREE | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ analyze table tt; 一下。 哦,你自己已经找了myisam 存储引擎,在插入表的时候不会同时去维护这些信息 版主,我做了一个新的实验,create table tt(a int, b int, c int, d int);create index ind_a on tt(a);insert into tt(a, b, c, d)values(1, 1, 1, 1), (1, 1, 1, 2),(1, 1, 2, 1), (1, 1, 2, 1),(1, 1, 2, 2), (1, 2, 1, 1),(1, 2, 2, 1), (1, 2, 2, 2);analyze table tt;mysql> show index from tt; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| tt | 1 | ind_a | 1 | a | A | 1 | NULL | NULL | YES | BTREE | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+mysql> select count(*) from tt;+----------+| count(*) |+----------+| 8 | +----------+这里,索引ind_a的cardinality是1,记录数为8,对于查询语句select * from tt where a = 1;来说,用索引反而不如直接全表扫描,但是以下explain的结果却不是这样:mysql> explain select * from tt where a = 1;+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+| 1 | SIMPLE | tt | ref | ind_a | ind_a | 5 | const | 7 | Using where | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+上边显示查询时用到了索引ind_a,是mysql optimizer的不足之处? 的确是,很奇怪。应该是MYSQL的问题。但当你再多插入几行后,倒是正确了。mysql> insert into tt(a, b, c, d) -> values(1, 1, 1, 1), (1, 1, 1, 2), -> (1, 1, 2, 1), (1, 1, 2, 1), -> (1, 1, 2, 2), (1, 2, 1, 1), -> (1, 2, 2, 1), (1, 2, 2, 2);Query OK, 8 rows affected (0.00 sec)Records: 8 Duplicates: 0 Warnings: 0mysql> insert into tt(a, b, c, d) -> values(1, 1, 1, 1), (1, 1, 1, 2), -> (1, 1, 2, 1), (1, 1, 2, 1), -> (1, 1, 2, 2), (1, 2, 1, 1), -> (1, 2, 2, 1), (1, 2, 2, 2);Query OK, 8 rows affected (0.00 sec)Records: 8 Duplicates: 0 Warnings: 0mysql> explain select * from tt where a = 1;+----+-------------+-------+------+---------------+------+---------+------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+------+-------------+| 1 | SIMPLE | tt | ALL | ind_a | NULL | NULL | NULL | 48 | Using where |+----+-------------+-------+------+---------------+------+---------+------+------+-------------+1 row in set (0.00 sec)mysql> 数据量少的时候,的确用索引不如全表扫描。我个人认为版主说的cardinality是反映散列程度的说法是对的,因此“如果cardinality大大少于数据的实际散列程度,那么索引就基本失效了”是对的。但是“某个索引对应的列包含多少个不同的值”是错误的,因为BTree是N叉的,这个cardinality很可能是N叉B树有多少个节点比较靠谱。我也做了些实验,从实验数据看,这个理解是比较合理的。 关于 primary key设计问题 一段多重循环SQL语句怎么写? mysql binglong还原的无奈 left join 和not in效率问题? 有人考过MySQL认证吗? 关于导入数据,请教大侠了!!! 如何给一个用户授权,使他只能看到某一个表? 连接MySQL数据库服务器? windows2000 server 下安装mysql的问题! 分表后如何保证某一字段的值是唯一?? 求一个SQL语句 Input string was not in a correct format.
我们可以使用SHOW INDEX语句来查看索引的散列程度:
create index ind_a on tt(a);
create index ind_ab on tt(a, b);
create index ind_abc on tt(a, b, c);
insert into tt(a, b, c)
values(1, 1, 1), (1, 1, 2),
(1, 2, 1), (1, 2, 1),
(1, 2, 2), (2, 1, 1),
(2, 2, 1), (2, 2, 2);mysql> show index from tt;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tt | 1 | ind_a | 1 | a | A | NULL | NULL | NULL | YES | BTREE | |
| tt | 1 | ind_ab | 1 | a | A | NULL | NULL | NULL | YES | BTREE | |
| tt | 1 | ind_ab | 2 | b | A | NULL | NULL | NULL | YES | BTREE | |
| tt | 1 | ind_abc | 1 | a | A | NULL | NULL | NULL | YES | BTREE | |
| tt | 1 | ind_abc | 2 | b | A | NULL | NULL | NULL | YES | BTREE | |
| tt | 1 | ind_abc | 3 | c | A | NULL | NULL | NULL | YES | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
上边cardinality那一项为什么都是NULL?
analyze table tt;
mysql> show index from tt;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tt | 1 | ind_a | 1 | a | A | 2 | NULL | NULL | YES | BTREE | |
| tt | 1 | ind_ab | 1 | a | A | 2 | NULL | NULL | YES | BTREE | |
| tt | 1 | ind_ab | 2 | b | A | 4 | NULL | NULL | YES | BTREE | |
| tt | 1 | ind_abc | 1 | a | A | 2 | NULL | NULL | YES | BTREE | |
| tt | 1 | ind_abc | 2 | b | A | 4 | NULL | NULL | YES | BTREE | |
| tt | 1 | ind_abc | 3 | c | A | 8 | NULL | NULL | YES | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
create table tt(a int, b int, c int, d int);
create index ind_a on tt(a);
insert into tt(a, b, c, d)
values(1, 1, 1, 1), (1, 1, 1, 2),
(1, 1, 2, 1), (1, 1, 2, 1),
(1, 1, 2, 2), (1, 2, 1, 1),
(1, 2, 2, 1), (1, 2, 2, 2);
analyze table tt;mysql> show index from tt;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tt | 1 | ind_a | 1 | a | A | 1 | NULL | NULL | YES | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+mysql> select count(*) from tt;
+----------+
| count(*) |
+----------+
| 8 |
+----------+这里,索引ind_a的cardinality是1,记录数为8,
对于查询语句select * from tt where a = 1;来说,用索引反而不如直接全表扫描,
但是以下explain的结果却不是这样:mysql> explain select * from tt where a = 1;
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+
| 1 | SIMPLE | tt | ref | ind_a | ind_a | 5 | const | 7 | Using where |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------------+
上边显示查询时用到了索引ind_a,是mysql optimizer的不足之处?
但当你再多插入几行后,倒是正确了。
mysql> insert into tt(a, b, c, d)
-> values(1, 1, 1, 1), (1, 1, 1, 2),
-> (1, 1, 2, 1), (1, 1, 2, 1),
-> (1, 1, 2, 2), (1, 2, 1, 1),
-> (1, 2, 2, 1), (1, 2, 2, 2);
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0mysql> insert into tt(a, b, c, d)
-> values(1, 1, 1, 1), (1, 1, 1, 2),
-> (1, 1, 2, 1), (1, 1, 2, 1),
-> (1, 1, 2, 2), (1, 2, 1, 1),
-> (1, 2, 2, 1), (1, 2, 2, 2);
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0mysql> explain select * from tt where a = 1;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | tt | ALL | ind_a | NULL | NULL | NULL | 48 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)mysql>
但是“某个索引对应的列包含多少个不同的值”是错误的,因为BTree是N叉的,这个cardinality很可能是N叉B树有多少个节点比较靠谱。我也做了些实验,从实验数据看,这个理解是比较合理的。