我的删除语句如下 ,发现慢得无法工作 总体下列语句功能是去掉重复的typeId,Name,nIddrop table if exists a;
CREATE TABLE `a` (
`a` bigint(4) default NULL,
KEY `idx_a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;insert a
select id from rt
where (typeId,Name,nId) in
(select typeId,Name,nId from rt group by typeId,Name,Id having count(*) > 1)
and id not in (select min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1);------这个语句实在无法工作 需要优化 (不知道如何下手)
delete from rt where id in (select a from a);
我的调优过程
select typeId,Name,nId from rt group by
typeId,Name,Id having count(*) > 1消耗时间达到5分钟。这里不知道如何优化???rt表有300万记录 重复3个字段的记录是在太多
CREATE TABLE `a` (
`a` bigint(4) default NULL,
KEY `idx_a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;insert a
select id from rt
where (typeId,Name,nId) in
(select typeId,Name,nId from rt group by typeId,Name,Id having count(*) > 1)
and id not in (select min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1);------这个语句实在无法工作 需要优化 (不知道如何下手)
delete from rt where id in (select a from a);
我的调优过程
select typeId,Name,nId from rt group by
typeId,Name,Id having count(*) > 1消耗时间达到5分钟。这里不知道如何优化???rt表有300万记录 重复3个字段的记录是在太多
我的想法是吧(select typeId,Name,nId from rt group by typeId,Name,Id having count(*) > 1) ---这个很快19s
and id not in (select min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1)--2分钟
做2个临时表 吧数据放到临时表里
之后再inner (你能把你的inner语句写出来吗?)否则select min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1这个需要2分钟
现在3个字段 加上2个子查询 有点棘手 (这个表有300万)
还是头疼
having count(*)>1)->
inner join
(select min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1) b on id=b.idselect min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1:
有重复的typeId,Name,nId 取ID最小,看看能否优化
having count(*)>1:->
select min(id) from ResolveTranslation group by typeId,Name,nId where
2>(select count(*) from ResolveTranslation where typeId=a.typeId and a.NameName and a.nId=nId )这样看看能否快一些
FROM rt
WHERE id NOT IN (
SELECT id
FROM rt
GROUP BY typeId, name, nId
HAVING count(*) = 1
)会不会更快一点,如果ID是rt的主键的话。
FROM rt
GROUP BY typeId, name, nId
HAVING count(*) = 1
----这个初始看可以的,但细细想不对, 他表示的是不重复的id,
我需要保存重复的一条 , 这个语句无法实现。 我等会测试你的sql 看是否一致
select id from rt
where (typeId,Name,nId) in
(select typeId,Name,nId from rt group by typeId,Name,Id having count(*) > 1)
相当于,先找出所有重复的组,返回符合这些重复组中所有的IDSELECT id
FROM rt
WHERE id NOT IN (
SELECT id
FROM rt
GROUP BY typeId, name, nId
HAVING count(*) = 1
)
相当于,先找出不重复的组,然后返回这些组以外的所有组,也就是重复组对应的所有ID两者是等价的。
FROM rt
GROUP BY typeId, name, nId
HAVING count(*) = 1
你这个可以解释一下 吗 ???? (不用min聚合函数 让人费解)整个执行SELECT id
FROM rt
WHERE id NOT IN (
SELECT id
FROM rt
GROUP BY typeId, name, nId
HAVING count(*) = 1
)
10分钟不能出数据 。 感觉逻辑上说不通
请注意我的要求是 功能是去掉重复的typeId,Name,nId
select id from rt
where (typeId,Name,nId) in
(select typeId,Name,nId from rt group by
typeId,Name,Id having count(*) > 1)
and id not in (select min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1);------请注意观察 id not in (select min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1); 不光重复组对应的所有ID ,还要保留一条重复组的id (必须做到这一点的额)
后面的, id not in,放到delete去
from rt t
where exists (
select 1
from rt
where typeId=t.typeId
and Name=t.Name
and nId=t.nId
and id<t.id
)
and id not in (select min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1)->
请注意 是not in这样不好转inner的 ???(如果是in 可以按照你的方法 )
(select min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1)
主要是这句要优化
select min(id) from ResolveTranslation group by typeId,Name,nId where
2>(select count(*) from ResolveTranslation where typeId=a.typeId and a.NameName and a.nId=nId )不知道结果是否正确
最后一步出现大问题delete from rt where id in (select id from a)
这里rt a 2表都是300万的大表 , id各自都有索引
速度太慢20分钟 检查os(linux)负载太大 ,而且从工具看 那个表好像咩有减少数据。
你和我想得一样 看样子大家都在提高我将后一部分放到删除里了
用了3张临时表(全部时间不到半小时)
任务完成现在思考哟用ACMAIN_CHM的方法 (感觉这位大哥的还是很强)
from rt t
where exists (
select 1
from rt
where typeId=t.typeId
and Name=t.Name
and nId=t.nId
and id<t.id
)得到的记录是 3004628而我的sql
select id from rt
where (typeId,Name,nId) in
(select typeId,Name,nId from rt group by typeId,Name,Id having count(*) > 1)
and id not in (select min(id) from ResolveTranslation group by typeId,Name,nId
having count(*)>1);
得到的额记录总是 是 3009350为什么中间会差5千条记录 ???/
where (typeId,Name,nId) in
(
select typeId,Name,nId
from rt
group by typeId,Name,Id
having count(*) > 1
)
and id not in (
select min(id) from ResolveTranslation
group by typeId,Name,nId
having count(*)>1
);你这儿是从另一张表?
select id from rt
where (typeId,Name,nId) in
(
select typeId,Name,nId
from rt
group by typeId,Name,Id
having count(*) > 1
)
and id not in (
select min(id) from rt
group by typeId,Name,nId
having count(*)>1
);
---吧这个数据插入到a(id)
很难过, 测试发现你的sql(你的执行后数据插入到表b(id))是对的额(未确认 但很可能是对的额)
我的sql发现数据差了5千条 ,到底错在什么地方 ????(我的测试方法如下:
SELECT id
FROM rt
GROUP BY typeId, name, nId
HAVING count(*) = 1
---吧这个数据插入到c (id)之后select count(*) from c where id not in (select id from a) --有4千条 错误
你的 select count(*) from c where id not in (select id from b)--为0 正确为什么我额sql会有问题??)
说明是包含关系 ACMAIN_CHM是对的
可以解释一下上面2个sql的差别 。 (搞了半天我还咩有明白差别在哪里?)
贴一条记录来看看。随便挑一个 select id from c where id not in (select id from a)
然后
select id, typeId, name, nId from rt where id = xx
select id from rt
where (typeId,Name,nId) in
(
select typeId,Name,nId
from rt
group by typeId,Name,Id
having count(*) > 1
)
and id not in (
select min(id) from rt
group by typeId,Name,nId
having count(*)>1
); 是错误的 -------不知道原因 ????
delete a1 from rt a1 inner join a on a.id=a1.id在连接字段上建立索引
from rt t
where exists (
select 1
from rt
where typeId=t.typeId
and Name=t.Name
and nId=t.nId
and id<t.id
)
用到子查询 是否会结果不确定???