复杂的删除重复记录问题

项目需求如下，一个表有５个字段（id, a,b,c,d）为简化问题　除c是varchar全部int，　id是自增长步长为１
我需要删除重复记录　　
我发现很多的（a　b　c　d）相同的记录　我需要删除所有的这４个字段都相同的记录（当然保留一条id最大的记录）。
第１个问题最简单（不需要保留记录）　
删除（a　b　c　d）相同的记录　　保留那些id最大的一条记录就可以啦后来发现问题好复杂
id必须全部全部保留，这样任何一条记录都不能删除。还要建立唯一索引（保证以后的不重复）
但是现在这个表有７万左右的重复４个字段的记录。先在c作为一个描述字段可以任意修改
所以我的思路是先取出a　d２个字段有重复的记录（a　d）是最关键的字段，基本他们２个可以确定整条记录，b不是很重要
之后用存储过程　把重复记录的每条记录记录的c依次加上　１　，比如原来都是‘auto ’,这样重复的依次就是suto1 auto2 　...auton问这个存储过程改如何写？

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

MySQL查询及删除重复记录的方法
(一)
1、查找表中多余的重复记录，重复记录是根据单个字段（peopleId）来判断
select * from people
where peopleId in (select  peopleId  from  people  group  by  peopleId  having  count(peopleId) > 1)2、删除表中多余的重复记录，重复记录是根据单个字段（peopleId）来判断，只留有rowid最小的记录
delete from people
where peopleId  in (select  peopleId  from people  group  by  peopleId   having  count(peopleId) > 1)
and rowid not in (select min(rowid) from  people  group by peopleId  having count(peopleId )>1)3、查找表中多余的重复记录（多个字段）
select * from vitae a
where (a.peopleId,a.seq) in  (select peopleId,seq from vitae group by peopleId,seq  having count(*) > 1)4、删除表中多余的重复记录（多个字段），只留有rowid最小的记录
delete from vitae a
where (a.peopleId,a.seq) in  (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
5、查找表中多余的重复记录（多个字段），不包含rowid最小的记录
select * from vitae a
where (a.peopleId,a.seq) in  (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)(二)
比方说
在A表中存在一个字段“name”，
而且不同记录之间的“name”值有可能会相同，
现在就是需要查询出在该表中的各记录之间，“name”值存在重复的项；
Select Name,Count(*) From A Group By Name Having Count(*) > 1如果还查性别也相同大则如下:
Select Name,sex,Count(*) From A Group By Name,sex Having Count(*) > 1
(三)
方法一declare @max integer,@id integerdeclare cur_rows cursor local for select 主字段,count(*) from 表名 group by 主字段 having count(*) >； 1open cur_rowsfetch cur_rows into @id,@maxwhile @@fetch_status=0beginselect @max = @max -1set rowcount @maxdelete from 表名 where 主字段 = @idfetch cur_rows into @id,@maxendclose cur_rowsset rowcount 0方法二　　有两个意义上的重复记录，一是完全重复的记录，也即所有字段均重复的记录，二是部分关键字段重复的记录，比如Name字段重复，而其他字段不一定重复或都重复可以忽略。　　1、对于第一种重复，比较容易解决，使用select distinct * from tableName　　就可以得到无重复记录的结果集。　　如果该表需要删除重复的记录（重复记录保留1条），可以按以下方法删除select distinct * into #Tmp from tableNamedrop table tableNameselect * into tableName from #Tmpdrop table #Tmp　　发生这种重复的原因是表设计不周产生的，增加唯一索引列即可解决。　　2、这类重复问题通常要求保留重复记录中的第一条记录，操作方法如下　　假设有重复的字段为Name,Address，要求得到这两个字段唯一的结果集select identity(int,1,1) as autoID, * into #Tmp from tableNameselect min(autoID) as autoID into #Tmp2 from #Tmp group by Name,autoIDselect * from #Tmp where autoID in(select autoID from #tmp2)　　最后一个select即得到了Name，Address不重复的结果集（但多了一个autoID字段，实际写时可以写在select子句中省去此列）(四)
查询重复select * from tablename where id in (select id from tablenamegroup by idhaving count(id) > 1)
本文来自CSDN博客，转载请标明出处：http://blog.csdn.net/softwave/archive/2009/02/14/3890576.aspx
直接删除重复的就可以了：delete t1 from 表名 t1,(select a,b,c,d,max(id) as max_id from 表名 group by a,b,c,d) t2 where t1.a=t2.a and t1.b=t2.b and t1.c=t2.c and t1.d=t2.d and a.id<>b.max_id
感谢楼上
刚才测试delete from vitae a
where (a.peopleId,a.seq) in  (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
发现失败　这个写法来自oracle　。　　
delete t1 from 表名 t1,(select a,b,c,d,max(id) as max_id from 表名 group by a,b,c,d) t2 where t1.a=t2.a and t1.b=t2.b and t1.c=t2.c and t1.d=t2.d and a.id<>b.max_id
这个要好很多　　（避免了inner　join　，　内联是mysql支持最差的）
第１个问题解决　
现在探讨第２个问题　　（承认这个问题难度很大　必须写存储过程）
id必须全部全部保留，这样任何一条记录都不能删除。还要建立唯一索引（保证以后的不重复）
但是现在这个表有７万左右的重复４个字段的记录。先在c作为一个描述字段可以任意修改
所以我的思路是先取出a　d２个字段有重复的记录（a　d）是最关键的字段，基本他们２个可以确定整条记录，b不是很重要
之后用存储过程　把重复记录的每条记录记录的c依次加上　１　，比如原来都是‘auto ’,这样重复的依次就是suto1 auto2 　...auton 问这个存储过程改如何写？
不懂你是后面的需求要做什么“id必须全部全部保留，这样任何一条记录都不能删除。还要建立唯一索引（保证以后的不重复）”
ID已经是自增的，那已经是主键，也就是唯一索引了，那肯定以后也不会重复啦
只保留ID的值，对应记录都删除掉了，有意义吗？
任何一条记录都不能删除
我的意思对有重复记录（a,b,c,d４个字段值都相同）的　c字段进行加１　如auto1 auto2....autonID已经是自增的，那已经是主键，也就是唯一索引----这里重复记录是按照a,b,c,d４个字段值都相同来进行比较的哦
　
好的
表结构　a（id，a，b，c　，d）除c是字符型　其他都是int　，id为自增长
a部分数据如下
id，a，b，c　，d
1  100  1 auto 6
2  290  0      1
3  100  1  auto 6
4  300 1   auto 2
5  290  0  auto 1
6  100  1  auto 6
7  290  0  auto  1
.............这里根据a　b　c　d是否都相等来判断是否重复
这个标准可以知道
有３条  100  1  auto 6　　是相等的
有２条　 290  0  auto  1 　是相等的因为要保护原有的数据不被删除，　保证以后插入的a　b　c　d　不再是重复的数据（我将建立唯一索引（ a　b　c　d ）　）所以存储过程要做的是将这有３条  100  1  auto 6　　和
有２条　 290  0  auto  1 　进行修改如
３条100  1  auto 6　可以修改为
100  1  auto１ 6　　
100  1  auto２ 6　
100  1  auto３ 6　２条　 290  0  auto  1可以修改为
　 290  0  auto１  1
　 290  0  auto２  1
这些重复记录对应的id不能改动。　只是将重复的记录的c字段进行加１如前面的例子
现在如何写这个存储过程？
mysql> select * from a;
+------+------+------+------+------+
| id   | a    | b    | c    | d    |
+------+------+------+------+------+
|    1 |  100 |    1 | auto |    6 |
|    2 |  290 |    0 |      |    1 |
|    3 |  100 |    1 | auto |    6 |
|    4 |  300 |    1 | auto |    2 |
|    5 |  290 |    0 | auto |    1 |
|    6 |  100 |    1 | auto |    6 |
|    7 |  290 |    0 | auto |    1 |
+------+------+------+------+------+
7 rows in set (0.00 sec)mysql> update a t1,(select id,
    ->  (select count(*) from a where a=t.a and b=t.b and c=t.c and d=t.d and id<=t.id) as k
    ->  from a t
    ->  where exists (select 1 from a where a=t.a and b=t.b and c=t.c and d=t.dand id!=t.id)
    -> ) t2
    -> set t1.c=concat(c,t2.k)
    -> where t1.id=t2.id;
Query OK, 5 rows affected (0.06 sec)
Rows matched: 5  Changed: 5  Warnings: 0mysql> select * from a;
+------+------+------+-------+------+
| id   | a    | b    | c     | d    |
+------+------+------+-------+------+
|    1 |  100 |    1 | auto1 |    6 |
|    2 |  290 |    0 |       |    1 |
|    3 |  100 |    1 | auto2 |    6 |
|    4 |  300 |    1 | auto  |    2 |
|    5 |  290 |    0 | auto1 |    1 |
|    6 |  100 |    1 | auto3 |    6 |
|    7 |  290 |    0 | auto2 |    1 |
+------+------+------+-------+------+
7 rows in set (0.00 sec)mysql>
真诚的admire楼上　
我在做存储过程但居然能完全通过sql实现　－－－不可思议