我现在需要把表phone里重复的数据删除,也就是cont_phone字段里包含关系的数据删除,我在cont_phone加了一个索引,但效率还是很低,执行下面的语句花了20分钟还没有跑出来,这个表的数据有40W笔,希望高手能给我一个建议。
select b.cont_id as phonecont_id,b.cont_phone,a.pktest,a.cont_id , a.cont_phone
from
phone a inner join phone b on a.cont_Phone like '%'+b.cont_Phone+'%'
and len(a.cont_phone)>7 and len(a.cont_phone)>len(b.cont_phone) and a.pktest<>b.pktest
我创建索引的语句create index index_phone on phone(cont_phone)
很急,在线等待各位SQL SERVER版达人
select b.cont_id as phonecont_id,b.cont_phone,a.pktest,a.cont_id , a.cont_phone
from
phone a inner join phone b on a.cont_Phone like '%'+b.cont_Phone+'%'
and len(a.cont_phone)>7 and len(a.cont_phone)>len(b.cont_phone) and a.pktest<>b.pktest
我创建索引的语句create index index_phone on phone(cont_phone)
很急,在线等待各位SQL SERVER版达人
应该是这句的问题
你把表结构、数据和结果帖出来吧,重新写一个
drop table [dbo].[phone]
GOCREATE TABLE [dbo].[phone] (
[pktest] [int] IDENTITY (1, 1) NOT NULL ,
[cont_id] [int] NOT NULL ,
[cont_Phone] [nvarchar] (255) COLLATE Chinese_PRC_CI_AS NULL ,
[cont_MobilePhone] [nvarchar] (255) COLLATE Chinese_PRC_CI_AS NULL ,
[cont_status] [nvarchar] (150) COLLATE Chinese_PRC_CI_AS NULL ,
[cont_Memo] [ntext] COLLATE Chinese_PRC_CI_AS NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
1 A
2 B
这样的,想得到的结果为
1 A
你的这个条件有点问题。
cont_phone字段里包含关系的数据删除--》》指的是什么意思?
2. 用临时表简化一下数据试试:select pktest,cont_id,cont_phone into #1
from phoneselect pktest,cont_id,cont_phone into #2
from #1
where len(cont_phone)>7select b.cont_id as phonecont_id,b.cont_phone,a.pktest,a.cont_id , a.cont_phone
from #2 a inner join #1 b
on a.cont_Phone like '%'+b.cont_Phone+'%'
and len(a.cont_phone)>len(b.cont_phone) and a.pktest <>b.pktest
举个例子:
cont_phone2234954
2234954,232367 --要删除这一行数据吗?
--TRY:SELECT b.cont_id as phonecont_id,b.cont_phone,a.pktest,a.cont_id , a.cont_phone
FROM phone a INNER HASH JOIN phone b
on CHARINDEX(b.cont_Phone,a.cont_Phone)>0
WHERE len(a.cont_phone)>7 AND len(a.cont_phone)>len(b.cont_phone)
AND a.pktest <>b.pktest
1 333569 37825886 15002119959 待处理 2009-8-4无人2009-8-5
2 259833 50387861 13764294399 不需要 2009-8-4宝宝不住这边
3 256192 家52827789(37825886) 13002123709 不需要 2009-8-4手机尚未启用,
---------------
我给出三条数据,比如1跟3就是重复的数据,我需要把这些数据找出来。表结构在上面的回帖里有。表数据40W条,主要是我的查询语句太低了。希望各位达人继续关注
WHERE EXISTS(SELECT 1 FROM TB WHEERE CHARINDEX(cont_Phone,T.cont_Phone)>0)?
可以换成
a.cont_phone<>b.cont_phone
WHERE EXISTS(SELECT 1 FROM TB WHEERE CHARINDEX(cont_Phone,T.cont_Phone)>0 AND pktest <>T.pktest)??
楼主试试
都不能很好的利用索引。KG的hash join是什么dd?
(
cont_id INT,cont_Phone VARCHAR(50),
cont_MobilePhone VARCHAR(50),
cont_status VARCHAR(50),cont_Memo VARCHAR(50))
INSERT TBTESTSELECT '333569' , '37825886' , '15002119959' , '待处理' , '2009-8-4无人2009-8-5' UNION
SELECT '259833' , '50387861' ,'13764294399' , '不需要' , '2009-8-4宝宝不住这边' UNION
SELECT '256192' , '家52827789(37825886)', '13002123709' , '不需要' , '2009-8-4手机尚未启用,' --DROP TABLE TBTESTSELECT * FROM TBTESTSELECT * FROM TBTEST T
WHERE EXISTS(SELECT 1 FROM TBTEST WHERE CHARINDEX(cont_Phone,T.cont_Phone)>0 AND cont_id<>T.cont_id)
OR
EXISTS(SELECT 1 FROM TBTEST WHERE CHARINDEX(T.cont_Phone,cont_Phone)>0 AND cont_id<>T.cont_id)
cont_id cont_Phone cont_MobilePhone cont_status cont_Memo
----------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------
256192 家52827789(37825886) 13002123709 不需要 2009-8-4手机尚未启用,
333569 37825886 15002119959 待处理 2009-8-4无人2009-8-5(所影响的行数为 2 行)
楼主修改看看?
SELECT b.cont_id AS phonecont_id,b.cont_phone,a.pktest,a.cont_id , a.cont_phone
FROM phone a INNER JOIN phone b ON CHARINDEX(b.cont_phone,a.cont_phone) > 0 AND a.pktest <> b.pktest
-- Author: HEROWANG(让你望见影子的墙)
-- Date : 2009-08-23 13:48:41
---------------------------------
IF OBJECT_ID('[tb]') IS NOT NULL
DROP TABLE [tb]
go
CREATE TABLE [tb] (id int,cont_id varchar(100),cont_Phone varchar(100),cont_MobilePhone VARCHAR(20),cont_status varchar(100),cont_Memo VARCHAR(600))
INSERT INTO [tb]
SELECT 1,333569,'37825886',15002119959,'待处理','2009-8-4无人2009-8-5' UNION ALL
SELECT 2,259833,'50387861',13764294399,'不需要','2009-8-4宝宝不住这边' UNION ALL
SELECT 3,256192,'家52827789(37825886)',13002123709,'不需要','2009-8-4手机尚未启用'select * from [tb]select * from tb t
where exists(select 1 from tb where CHARINDEX(cont_phone,t.cont_phone) > 0 and cont_phone<>t.cont_phone)id cont_id cont_Phone cont_MobilePhone cont_status cont_Memo
3 256192 家52827789(37825886) 13002123709 不需要 2009-8-4手机尚未启用
SELECT * FROM [phone] T
WHERE EXISTS(SELECT 1 FROM [phone] WHERE CHARINDEX(cont_Phone,T.cont_Phone)>0 AND cont_id<>T.cont_id)
UNION ALL
SELECT * FROM [phone] T
WHERE EXISTS(SELECT 1 FROM [phone] WHERE CHARINDEX(T.cont_Phone,cont_Phone)>0 AND cont_id<>T.cont_id) OR换成UNION ALL试试
WHERE (CHARINDEX(T1.cont_Phone,T.cont_Phone)>0
OR CHARINDEX(T.cont_Phone,T1.cont_Phone)>0 )AND T1.cont_id<>T.cont_id
少量数据测试,效率和EXISTS一样
SELECT * FROM TBTEST T ,TBTEST T1 WHERE T.cont_Phone LIKE '%'+T1.cont_Phone+'_%' AND T1.cont_id<>T.cont_id
这样子效率还好点
SELECT * FROM TBTEST WHERE cont_Phone LIKE '%50387861%'
测试这个的时候,莫非列名作匹配条件不同???
这句想办法不用like呀,用like索引基本上没用!
例:
a.cont_Phone = rtrim(b.cont_Phone)把cont_Phone先更新成一致
先把不规则资料资料找出来整理,再筛选重复记录。