弱弱的问题.... in和exists =.=!!

我在网上看了很多关于两者的区别和性能的文章,一直看得模模糊糊的,直到我看到了其中的两篇,我终于看晕了.....第一篇:结论是 exits适合内小外大的查询，in适合内大外小的查询第二篇:结论是简单的一句话，外大内小=IN，外小内大=EXISTS，这是一个实用的概略评估方法，在大部分的情况下是适用的。两篇都是经过举例后得出结论的...于是我迷惑了~麻烦达人们能用自己的语言解释下2者的区别和用法,谢谢了,当然,能举例就更好了~

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

第一篇:结论是 exits适合内小外大的查询，in适合内大外小的查询这个正确!
估计是这个意思吧.假设TAB_A中A字段有5万个不同值,TAB_B中B字段有5个不同值
select * from TAB_A where A in (select B from TAB_B) 就是内小外大
select * from TAB_B where exists (select 1 from TAB_A where TAB_B.B=TAB_A.A)就是内大外小
从网上摘录下来的，供你参考。
EXISTS的执行流程
select * from t1 where exists ( select null from t2 where y = x )
可以理解为:
   for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then
         OUTPUT THE RECORD
      end if
   end loop
对于in 和 exists的性能区别:
   如果子查询得出的结果集记录较少，主查询中的表较大且又有索引时应该用in,反之如果外层的主查询记录较少，子查询中的表大，又有索引时使用exists。
   其实我们区分in和exists主要是造成了驱动顺序的改变（这是性能变化的关键），如果是exists，那么以外层表为驱动表，先被访问，如果是IN，那么先执行子查询，所以我们会以驱动表的快速返回为目标，那么就会考虑到索引及结果集的关系了

另外IN时不对NULL进行处理
如：
select 1 from dual where null  in (0,1,2,null)
为空
学习了，真是高手云集啊，不知道何时也能成为oracle高手
2.NOT IN 与NOT EXISTS:
NOT EXISTS的执行流程
select .....
  from rollup R
where not exists ( select 'Found' from title T
                             where R.source_id = T.Title_ID);
可以理解为:
for x in ( select * from rollup )
      loop
          if ( not exists ( that query ) ) then
                 OUTPUT
          end if;
       end;注意:NOT EXISTS 与 NOT IN 不能完全互相替换，看具体的需求。如果选择的列可以为空，则不能被替换。例如下面语句，看他们的区别：
select x,y from t;
x              y
------         ------
1              3
3        1
1        2
1        1
3        1
5
select * from t where  x not in (select y from t t2  )
no rows

select * from t where  not exists (select null from t t2
                                                  where t2.y=t.x )
x       y
------  ------
5       NULL
所以要具体需求来决定对于not in 和 not exists的性能区别：
   not in 只有当子查询中，select 关键字后的字段有not null约束或者有这种暗示时用not in,另外如果主查询中表大，子查询中的表小但是记录多，则应当使用not in,并使用anti hash join.
   如果主查询表中记录少，子查询表中记录多，并有索引，可以使用not exists,另外not in最好也可以用/*+ HASH_AJ */或者外连接+is null
NOT IN 在基于成本的应用中较好比如:
select .....
from rollup R
where not exists ( select 'Found' from title T
                           where R.source_id = T.Title_ID);改成（佳）select ......
from title T, rollup R
where R.source_id = T.Title_id(+)
    and T.Title_id is null;

或者（佳）
sql> select /*+ HASH_AJ */ ...
        from rollup R
        where ource_id NOT IN ( select ource_id
                                               from title T
                                              where ource_id IS NOT NULL )注意：上面只是从理论上提出了一些建议，最好的原则是大家在上面的基础上，能够使用执行计划来分析，得出最佳的语句的写法
希望大家提出异议
其实现在大多数都用CBO了，两者差异并不明显。
举个例子，可以看出两种方式查询的执行计划完全一样。
(1)
SQL> create table test as select * from dba_objects;表已创建。SQL> create table test2 as select * from user_objects;表已创建。SQL> select count(*) from test;  COUNT(*)
----------
     50072SQL> select count(*) from test2;  COUNT(*)
----------
       138SQL> create index ind_test on test(object_id);索引已创建。SQL> create index ind_test2 on test2(object_id);索引已创建。SQL> exec dbms_stats.gather_table_stats('OPER','TEST',CASCADE=>true)PL/SQL 过程已成功完成。SQL> exec dbms_stats.gather_table_stats('OPER','TEST2',CASCADE=>true)PL/SQL 过程已成功完成。SQL> set autot traceonlySQL> select * from test
  2  where object_id in(select object_id from test2);已选择136行。
执行计划
----------------------------------------------------------
Plan hash value: 101091285-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   136 | 13328 |   140   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST      |     1 |    93 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS              |           |   136 | 13328 |   140   (1)| 00:00:02 |
|   3 |    SORT UNIQUE              |           |   138 |   690 |     1   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN         | IND_TEST2 |   138 |   690 |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN         | IND_TEST  |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------   5 - access("OBJECT_ID"="OBJECT_ID")
统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        191  consistent gets
          0  physical reads
          0  redo size
      12402  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        136  rows processedSQL> select * from test a
  2  where exists(select 1 from test2 b
  3  where a.object_id=b.object_id);已选择136行。
执行计划
----------------------------------------------------------
Plan hash value: 101091285-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   136 | 13328 |   140   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST      |     1 |    93 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS              |           |   136 | 13328 |   140   (1)| 00:00:02 |
|   3 |    SORT UNIQUE              |           |   138 |   690 |     1   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN         | IND_TEST2 |   138 |   690 |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN         | IND_TEST  |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------   5 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        191  consistent gets
          0  physical reads
          0  redo size
      12402  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        136  rows processedSQL>
（2）
SQL> select * from test2
  2  where object_id in(select object_id from test);已选择136行。
执行计划
----------------------------------------------------------
Plan hash value: 1344897278----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |   137 | 12467 |    32  (10)| 00:00:01 |
|*  1 |  HASH JOIN SEMI       |          |   137 | 12467 |    32  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL   | TEST2    |   138 | 11868 |     3   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| IND_TEST | 50072 |   244K|    27   (4)| 00:00:01 |
----------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------   1 - access("OBJECT_ID"="OBJECT_ID")
统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        131  consistent gets
          0  physical reads
          0  redo size
      12342  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        136  rows processedSQL> select * from test2 a
  2  where exists(select 1 from test b
  3  where a.object_id=b.object_id);已选择136行。
执行计划
----------------------------------------------------------
Plan hash value: 1344897278----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |   137 | 12467 |    32  (10)| 00:00:01 |
|*  1 |  HASH JOIN SEMI       |          |   137 | 12467 |    32  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL   | TEST2    |   138 | 11868 |     3   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| IND_TEST | 50072 |   244K|    27   (4)| 00:00:01 |
----------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------   1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        131  consistent gets
          0  physical reads
          0  redo size
      12342  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        136  rows processedSQL>
java我还算个初级程序员吧至于oracle就......  =.=!!首先感谢各位的帮助.其次 1,2楼的回答 exits适合内小外大的查询，in适合内大外小的查询不是和6楼的如果子查询得出的结果集记录较少，主查询中的表较大且又有索引时应该用in,反之如果外层的主查询记录较少，子查询中的表大，又有索引时使用exists。  矛盾了么~~~~最后,oracle我刚开始学,就是一菜,虽然我不能完全看明白执行计划 ,但是非常感谢 vc555 朋友这种回帖的态度