200W条数据的排他搜索!

A表
id  bid  dtime
1    1   2009-09-27 12:09:42
2    2   2009-09-27 12:09:42
3    3   2009-09-27 12:09:42
4    4   2009-09-27 12:09:42
5    5   2009-09-27 12:09:42
6    6   2009-09-27 12:09:42
7    7   2009-09-27 12:09:42
8    8   2009-09-27 12:09:42
9    9   2009-09-27 12:09:42
.....
B表
id   name  dtime
1    p1    2009-09-27 12:09:42
2    p2    2009-09-27 12:09:42
3    p3    2009-09-27 12:09:42
4    p4    2009-09-27 12:09:42
5    p5    2009-09-27 12:09:42
6    p6    2009-09-27 12:09:42
7    p7    2009-09-27 12:09:42
8    p8    2009-09-27 12:09:42
.....
200W p200W    2009-09-27 12:09:42
共200W条数据两个表,A表用于记录B表的已选择项
我在做列表的时候就不希望出现在A表中已经有记录的B表数据简单的方法
select b.id from b where b.id not in (select a.bid from a)
但这样的效率不是很好!
如果未来再增长到千万级,肯定受不了了!
请问各位有什么好的办法?

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

首先，你先对b表的id列、a表的bid列各建立索引，然后试下这样来查：select b.id from b left join a on b.id=a.bid where a.bid is null
select b.id from b where b.id not in (select a.bid from a)
78msselect b.id from b left join a on b.id=a.bid where a.bid is null
94ms速度反而慢哦所以我都建了
那这个呢：
select b.id from b where not exists (select 1 from a where a.bid=b.id)
当a.fieldA与b.fieldB能join上的时候都是一一映射的时候
用以下方式
Sql代码
1.--方式一
2.select A.*
3.from A
4.left join b on
5.    a.fieldA = b.fieldB
6.where b.fieldB is null
--方式一
select A.*
from A
left join b on
    a.fieldA = b.fieldB
where b.fieldB is null
如果不是一一映射
用以下方式 Sql代码
1.--方式二
2.select A.*
3.from A
4.where not exists
5.(
6.    select 1
7.    from B
8.    where a.fieldA = b.fieldB
9.)
--方式二
select A.*
from A
where not exists
(
    select 1
    from B
    where a.fieldA = b.fieldB
)
当然，not exists的方式比not in快不了多少
而且，left join 会导致对表A的全表扫描，当表A中数据很多的时候，方式一也可能不够快也可以换一种思维，在表中添加冗余
在表A中加一个使用标识位（useFlg）
一旦表B中使用了fieldA，就把表A对应记录中的useFlg设置为1
于是就可以用以下sql
Sql代码
1.--方式三
2.select A.*
3.from A
4.where useFlg = '0'
--------------------------------------------------------------------------------------
摘自别人的文章你可以参考下
两表均建索引 b(id) , a(bid)
select *
from b left join a on b.id=a.bid
where a.id is null但对于 200 W 数据来说，应该不会有效果上的显著提高！
因为从理论上来说全表扫描是最好的选择。设想一下，让你手工从100个人中挑出所有不姓“张”人，你会如何做？去一张已经有按名字排序的纸上找出姓的的，然后把剩余的名单一个一个抄过来，还是直接从花名册上抄，然后看到姓“张”就跳过？