cDNA_map:
+----+------------+---------+---------+-------------+--------+----------+
| id | cluster_id | read_id | cDNA_id | cDNA_offset | strand | mismatch |
+----+------------+---------+---------+-------------+--------+----------+
| 1 | 1 | 1 | 5 | 3129 | 1 | 0 |
| 2 | 2 | 3 | 7 | 264 | 1 | 0 |
| 3 | 3 | 2 | 7 | 1013 | 1 | 0 |
| 4 | 4 | 4 | 7 | 1200 | 1 | 0 |
| 5 | 5 | 5 | 7 | 1247 | 1 | 0 |
+----+------------+---------+---------+-------------+--------+----------+
reads_info:
+----+--------------------+-------------------------------------+---------+
| id | name | seq | quality |
+----+--------------------+-------------------------------------+---------+
| 1 | read1|cds5|(27686. | GGGATCACTTGATATTACAGAAAACCAGCTATCAT | NULL |
| 2 | read2|cds7|(34956. | GCTCTTCTCCTCCAAGTATAACTGCCATTGCCGCT | NULL |
| 3 | read3|cds8|(35879. | CATAGAAATTCCGCCTCCTCGTCCTAAACGAAAAC | NULL |
| 4 | read4|cds9|(34769. | TGAGAAACAAAACACAGCTCTGCAAGATCAAAACT | NULL |
| 5 | read5|cds10|(34722 | CTCCAGCTTCATCATCTGATGATTCAGATGAGACT | NULL |
+----+--------------------+-------------------------------------+---------+查询语句:
select a.cluster_id,b.name,a.cDNA_offset,a.cDNA_offset+34,a.strand,a.mismatch from cDNA_map a inner join reads_info b on a.read_id=b.id where a.cDNA_id=14;
其中 reads_info 有 600万行, cDNA_map 有380万行, cDNA_map 上id 主键,cDNA_id索引 , reads_info 上id 主键 , 查询极慢。就算是当 a.cDNA_id=14 时候a 表 只有 800行 inner join b表 都极慢。(一般情况下是 某个a.cDNA_id下 会有30000行 和 b表 inner join)
但是 另外有一中情况: reads_info 有 100万行, cDNA_map 有80万行,索引一样, 当a.cDNA_id=14时候 a表有 2000行 ,inner join b表 也算快了。
+----+------------+---------+---------+-------------+--------+----------+
| id | cluster_id | read_id | cDNA_id | cDNA_offset | strand | mismatch |
+----+------------+---------+---------+-------------+--------+----------+
| 1 | 1 | 1 | 5 | 3129 | 1 | 0 |
| 2 | 2 | 3 | 7 | 264 | 1 | 0 |
| 3 | 3 | 2 | 7 | 1013 | 1 | 0 |
| 4 | 4 | 4 | 7 | 1200 | 1 | 0 |
| 5 | 5 | 5 | 7 | 1247 | 1 | 0 |
+----+------------+---------+---------+-------------+--------+----------+
reads_info:
+----+--------------------+-------------------------------------+---------+
| id | name | seq | quality |
+----+--------------------+-------------------------------------+---------+
| 1 | read1|cds5|(27686. | GGGATCACTTGATATTACAGAAAACCAGCTATCAT | NULL |
| 2 | read2|cds7|(34956. | GCTCTTCTCCTCCAAGTATAACTGCCATTGCCGCT | NULL |
| 3 | read3|cds8|(35879. | CATAGAAATTCCGCCTCCTCGTCCTAAACGAAAAC | NULL |
| 4 | read4|cds9|(34769. | TGAGAAACAAAACACAGCTCTGCAAGATCAAAACT | NULL |
| 5 | read5|cds10|(34722 | CTCCAGCTTCATCATCTGATGATTCAGATGAGACT | NULL |
+----+--------------------+-------------------------------------+---------+查询语句:
select a.cluster_id,b.name,a.cDNA_offset,a.cDNA_offset+34,a.strand,a.mismatch from cDNA_map a inner join reads_info b on a.read_id=b.id where a.cDNA_id=14;
其中 reads_info 有 600万行, cDNA_map 有380万行, cDNA_map 上id 主键,cDNA_id索引 , reads_info 上id 主键 , 查询极慢。就算是当 a.cDNA_id=14 时候a 表 只有 800行 inner join b表 都极慢。(一般情况下是 某个a.cDNA_id下 会有30000行 和 b表 inner join)
但是 另外有一中情况: reads_info 有 100万行, cDNA_map 有80万行,索引一样, 当a.cDNA_id=14时候 a表有 2000行 ,inner join b表 也算快了。
看看是否用到索引。查询慢一般都是没有用得索引引起的
+----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+
| 1 | SIMPLE | a | ref | cDNA2,cDNA3 | cDNA2 | 4 | const | 17183 | |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | hongshu_68.a.read_id | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+
2 rows in set (0.63 sec)索引是这样的:
create index cDNA2 on cDNA_map(cDNA_id);
create index cDNA3 on cDNA_map(read_id);
row: 17183 一下取这么多数据,当然慢了。而且在大的SQL中还要做运算(a.cDNA_offset+34)。这样当然慢了。在添加一个索引:create index cDNA3 on cDNA_map(read_id,cDNA_id);
试一下。
+----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+
¦ id ¦ select_type ¦ table ¦ type ¦ possible_keys ¦ key ¦ key_len ¦ ref ¦ rows ¦ Extra ¦
+----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+
¦ 1 ¦ SIMPLE ¦ a ¦ ref ¦ cDNA2,cDNA3 ¦ cDNA2 ¦ 4 ¦ const ¦ 17183 ¦ ¦
¦ 1 ¦ SIMPLE ¦ b ¦ eq_ref ¦ PRIMARY ¦ PRIMARY ¦ 4 ¦ hongshu_68.a.read_id ¦ 1 ¦ ¦
+----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+
2 rows in set (0.63 sec) explaine
就是这样 。 只用到 cDNA2 索引