在进行查询时要尽量少用in,not in,要用left join替换,但是最近遇到这样一个问题,在使用not in的查询时间比left join的查询时间要少,不知道什么原因?希望大家帮忙看看
查询的功能是,查询出集合a不在集合b中的数据。执行这条sql,查询出来的结果耗时0.39 sec
select a.ruid,b.ruid from( select distinct RUID from UserMsg where ID =12 and Time >= '2009-8-14 15:30:00' and Time<='2009-8-17 16:00:00') a left join ( select distinct RUID from UserMsg where ID =12 and Time < '2009-8-14 15:30:00' ) b on a.ruid = b.ruid where b.ruid is null
exlpain 之后为
+----+-------------+------------+-------+----------------------+------------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+----------------------+------------+---------+------+------+------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 452 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 1112 | Using where; Not exists |
| 3 | DERIVED | UserMsg | ref | ID,Time | ID | 5 | | 6755 | Using where; Using temporary |
| 2 | DERIVED | UserMsg | range | ID,Time | Time | 9 | NULL | 2115 | Using where; Using temporary |
+----+-------------+------------+-------+----------------------+------------+---------+------+------+------------------------------+
而执行下面这条not in之后执行时间为0.09sec
select add_tb.RUID from (select distinct RUID from UserMsg where ID =12 and Time>'2009-8-14 15:30:00' and Time<='2009-8-17 16:00:00' ) add_tb where add_tb.RUID not in (select distinct RUID from UserMsg where ID =12 and Time<'2009-8-14 15:30:00');
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 452 | Using where |
| 3 | DEPENDENT SUBQUERY | UserMsg | index_subquery | RUID,ID,Time | RUID | 96 | func | 2 | Using index; Using where |
| 2 | DERIVED | UserMsg | range | ID,Time | Time | 9 | NULL | 2113 | Using where; Using temporary |
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+------------------------------+
查询的功能是,查询出集合a不在集合b中的数据。执行这条sql,查询出来的结果耗时0.39 sec
select a.ruid,b.ruid from( select distinct RUID from UserMsg where ID =12 and Time >= '2009-8-14 15:30:00' and Time<='2009-8-17 16:00:00') a left join ( select distinct RUID from UserMsg where ID =12 and Time < '2009-8-14 15:30:00' ) b on a.ruid = b.ruid where b.ruid is null
exlpain 之后为
+----+-------------+------------+-------+----------------------+------------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+----------------------+------------+---------+------+------+------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 452 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 1112 | Using where; Not exists |
| 3 | DERIVED | UserMsg | ref | ID,Time | ID | 5 | | 6755 | Using where; Using temporary |
| 2 | DERIVED | UserMsg | range | ID,Time | Time | 9 | NULL | 2115 | Using where; Using temporary |
+----+-------------+------------+-------+----------------------+------------+---------+------+------+------------------------------+
而执行下面这条not in之后执行时间为0.09sec
select add_tb.RUID from (select distinct RUID from UserMsg where ID =12 and Time>'2009-8-14 15:30:00' and Time<='2009-8-17 16:00:00' ) add_tb where add_tb.RUID not in (select distinct RUID from UserMsg where ID =12 and Time<'2009-8-14 15:30:00');
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 452 | Using where |
| 3 | DEPENDENT SUBQUERY | UserMsg | index_subquery | RUID,ID,Time | RUID | 96 | func | 2 | Using index; Using where |
| 2 | DERIVED | UserMsg | range | ID,Time | Time | 9 | NULL | 2113 | Using where; Using temporary |
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+------------------------------+
select distinct a.RUID
from UserMsg a
left join UserMsg b on a.ruid = b.ruid and b.ID =12 and b.Time < '2009-8-14 15:30:00'
where a.ID =12 and a.Time >= '2009-8-14 15:30:00' and a.Time <='2009-8-17 16:00:00'
and b.ruid is null
但是你能说一下为什么我查询的两个sql的效率问题是怎么回事吗?谢谢
第1种情况生成了1个临时表(IO量相对小),临时表与后面的子查询关联时,是用上了索引的。
这句话的真正含义实际上指用join代替子查询,mysql对子查询处理的不好,这是个bug,一直没有很好的解决。
select a.ruid,b.ruid
from ( select distinct RUID
from UserMsg
where ID =12
and Time >= '2009-8-14 15:30:00'
and Time <='2009-8-17 16:00:00'
) a left join (
select distinct RUID
from UserMsg
where ID =12 and Time < '2009-8-14 15:30:00'
) b on a.ruid = b.ruid
where b.ruid is null
你直接用了两个临时表, 如果两个临时表的返回记录数量均不多,则效率上是可行的,但如果这两个表 a,b 结果集比较大则无索引可用了。可以试一下这句。创建 (id,time) , (id,ruid,time) 两个索引
select a.ruid
from UserMsg a
where a.ID =12
and a.Time >= '2009-8-14 15:30:00'
and a.Time <='2009-8-17 16:00:00'
and not exists (
select RUID
from UserMsg
where ID =12 and Time < '2009-8-14 15:30:00'
and ruid=a.ruid
)