索引的Cardinality是什么意思？

表的cardinality（可以翻译为“散列程度”），它表示某个索引对应的列包含多少个不同的值——如果cardinality大大少于数据的实际散列程度，那么索引就基本失效了。
我们可以使用SHOW INDEX语句来查看索引的散列程度：

做了个实验，不过有些问题：create table tt(a int, b int, c int);
create index ind_a on tt(a);
create index ind_ab on tt(a, b);
create index ind_abc on tt(a, b, c);
insert into tt(a, b, c)
values(1, 1, 1), (1, 1, 2),
(1, 2, 1), (1, 2, 1),
(1, 2, 2), (2, 1, 1),
(2, 2, 1), (2, 2, 2);mysql> show index from tt;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tt    |          1 | ind_a    |            1 | a           | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| tt    |          1 | ind_ab   |            1 | a           | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| tt    |          1 | ind_ab   |            2 | b           | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| tt    |          1 | ind_abc  |            1 | a           | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| tt    |          1 | ind_abc  |            2 | b           | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| tt    |          1 | ind_abc  |            3 | c           | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
上边cardinality那一项为什么都是NULL?

现在好了，需要analyse一下才行
analyze table tt;
mysql> show index from tt;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tt    |          1 | ind_a    |            1 | a           | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |
| tt    |          1 | ind_ab   |            1 | a           | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |
| tt    |          1 | ind_ab   |            2 | b           | A         |           4 |     NULL | NULL   | YES  | BTREE      |         |
| tt    |          1 | ind_abc  |            1 | a           | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |
| tt    |          1 | ind_abc  |            2 | b           | A         |           4 |     NULL | NULL   | YES  | BTREE      |         |
| tt    |          1 | ind_abc  |            3 | c           | A         |           8 |     NULL | NULL   | YES  | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

analyze table tt; 一下。

哦，你自己已经找了myisam 存储引擎，在插入表的时候不会同时去维护这些信息

的确是，很奇怪。应该是MYSQL的问题。
但当你再多插入几行后，倒是正确了。
mysql> insert into tt(a, b, c, d)
    -> values(1, 1, 1, 1), (1, 1, 1, 2),
    -> (1, 1, 2, 1), (1, 1, 2, 1),
    -> (1, 1, 2, 2), (1, 2, 1, 1),
    -> (1, 2, 2, 1), (1, 2, 2, 2);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0mysql> insert into tt(a, b, c, d)
    -> values(1, 1, 1, 1), (1, 1, 1, 2),
    -> (1, 1, 2, 1), (1, 1, 2, 1),
    -> (1, 1, 2, 2), (1, 2, 1, 1),
    -> (1, 2, 2, 1), (1, 2, 2, 2);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0mysql> explain select * from tt where a = 1;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | tt    | ALL  | ind_a         | NULL | NULL    | NULL |   48 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)mysql>

数据量少的时候，的确用索引不如全表扫描。我个人认为版主说的cardinality是反映散列程度的说法是对的，因此“如果cardinality大大少于数据的实际散列程度，那么索引就基本失效了”是对的。
但是“某个索引对应的列包含多少个不同的值”是错误的，因为BTree是N叉的，这个cardinality很可能是N叉B树有多少个节点比较靠谱。我也做了些实验，从实验数据看，这个理解是比较合理的。

调试易

索引的Cardinality是什么意思？

解决方案 »