paper
ID Title Content
1 A b
2 A b
3 A b
4 b D当Title和Content重复,就判断为重复的记录我用语句
select ID from paper group by Title,Content having count(*) >1
只能取出一部分,如何能一次取出多余数据的ID,比如上面的例子中,应该取出2、3?因为数据量比较大,大概几十万。但是重复量较小。所以不得不优化。
原先用
select id from pager where id not in(select id from paper group by title,content)
太慢太慢了。谢谢
ID Title Content
1 A b
2 A b
3 A b
4 b D当Title和Content重复,就判断为重复的记录我用语句
select ID from paper group by Title,Content having count(*) >1
只能取出一部分,如何能一次取出多余数据的ID,比如上面的例子中,应该取出2、3?因为数据量比较大,大概几十万。但是重复量较小。所以不得不优化。
原先用
select id from pager where id not in(select id from paper group by title,content)
太慢太慢了。谢谢
ID int identity(1,1) primary key, Title varchar(10), Content varchar(10)
)insert #tmp1
select 'A','b'
union all
select 'A','b'
union all
select 'A','b'
union all
select 'b','D'select id from #tmp1 a where exists(select 1 from #tmp1 where title=a.title and content=a.content and id<=a.id
group by title,content having count(*)>1)
insert @paper select 1, 'A', 'b'
union all select 2, 'A', 'b'
union all select 3, 'A', 'b'
union all select 4, 'b', 'D'
select * from @paper a where exists(select 1 from @paper where Title=a.title and Content=a.content and id<a.id)
from paper a
where exists(
select 1 from paper where Title=a.title and Content=a.content and id<a.id
)
ID
FROM
paper LEFT OUTER JOIN
(
SELECT MIN(ID) AS ID, Title,Content FROM paper group by title,content
) T1
ON
paper.ID = T1.IDWHERE T1.ID IS NULL
电脑杂工的方法是我最早用的方法。
但是太慢了。
我的数据接近100万。itblog的方法效率比电脑杂工的高不少(没有用in)。但是还是慢啊。小伟用临时表的方法。但是我的数据100万……哭死