有两张表A和C。
A表:存储的是IID 和SID的对应关系,一多对应,一个IID 最少有一个,最多不超过5个对应的SID。IID和SID都是数字类型。A表约有500K条记录。A表样例如下:
OID IID SID
-----------
001 100 100
002 100 101
003 100 200
004 101 151
005 101 152
...
比如IID = 100 同时有三条记录,SID分别为100,200,101。 C表:存储的是每一个SID对应的详细属性,如M,N,O, P等等。SID列有唯一索引。其它的属性(M,N,等)均为不定长字符类型,有重复值。C表约500K条记录。示例如下:
OID SID M N O ...
------------------
001 100 "C" "Y" ..
002 101 "C" "Y" ..
003 121 "P" "Y" ..
... ..
006 151 "C" "Z" ..
007 200 "C" "X" ..
...
需要找出的数据集为:
从C表中找出所有满足条件的SID值对,使得:
1)两个不同的SID,它们有同样的N,同样的M,
2)而且两个不同的SID对应至少一个相同的IID。
比如上面的样表中SID=100和SID=101即为一组符合条件的值对。现在的做法:
1)从A表里取出Distinct IID 记为id
2)对C表执行查询 Select * From C Where SID In (Select SID From A Where IID = id)以获取一个小的SID集合 记为result
3)对result内的每条记录的N, M属性对比
4)遍历所有的 IID 现在的查询效率很低,在不更改表结构和添加新的物理表的前提下,请问大家有没有什么更高效的查询方法?比如连接查询一类?有没有什么别的技巧?比如注意到C表中大概有1/3数据N属性为Null,这些记录不用检查,但不知道能怎么利用这个特点。
谢谢!
select * from (
select c.*,COUNT(*) OVER (PARTITION BY a.iid ) AS cnt
from a ,c
where a.sid=c.sid)
where cnt>=1
SQL> WITH BB AS
2 (
3 SELECT '001' AS OID,'100' AS SID,'C' AS M,'Y' AS N FROM DUAL
4 UNION
5 SELECT '002' AS OID,'101' AS SID,'C' AS M,'Y' AS N FROM DUAL
6 UNION
7 SELECT '003' AS OID,'121' AS SID,'P' AS M,'Y' AS N FROM DUAL
8 UNION
9 SELECT '006' AS OID,'151' AS SID,'C' AS M,'Z' AS N FROM DUAL
10 UNION
11 SELECT '007' AS OID,'200' AS SID,'C' AS M,'X' AS N FROM DUAL
12 )
13 SELECT distinct a.SID FROM BB a
14 inner join BB b on a.M=b.M AND a.N=b.N and a.OID<>b.OID
15 where a.SID in(
16 with AA
17 AS(
18 SELECT '001' AS OID,'100' AS IID,'100' AS SID FROM DUAL
19 UNION
20 SELECT '002' AS OID,'100' AS IID,'101' AS SID FROM DUAL
21 UNION
22 SELECT '003' AS OID,'100' AS IID,'200' AS SID FROM DUAL
23 UNION
24 SELECT '004' AS OID,'101' AS IID,'151' AS SID FROM DUAL
25 UNION
26 SELECT '005' AS OID,'101' AS IID,'152' AS SID FROM DUAL
27 )
28 SELECT distinct a.SID FROM AA a inner join AA b on a.OID<>b.OID and a.IID=b.IID and a.SID<>b.SID
29 )
30 /
SID
---
101
100
from c a , c b
where a.n is not null
and b.n is not null
and a.n=b.n
and a.m=b.m
and a.sid != b.sid
into result ;
Selelct * from A,C where A.SID = C.SID And C.N is not Null ORDER by A.IID, M, N
这样的好处是只用一次循环,比较IID,M,N相同记录即可。