去重问题, 如何筛选出1小时内的重复数据

去重问题, 如何筛选出1小时内的重复数据每条记录都有一个时间戳
和上一条记录时间相差1小时内即为重复数据
如何筛选出重复数据,然后删除
例如:
时间转换成时分秒格式方便观看
id   time
1    2012-12-25 10:10:11
2    2012-12-25 10:30:11  *
3    2012-12-25 10:40:11  *
4    2012-12-25 11:20:11记录2,3为重复,需要筛选出来以备删除
记录4虽然和3相差不到一小时,但记录2,3为重复数据,4和1对比超过一小时为不重复数据
数据量巨大, 用程序双循环时间太慢, 有什么好办法没?

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

没有办法，只能递归或者循环处理。算法思路可以简单一点。重复执行下面语句，直到ROW_COUNT()=0,即删除记录数为0为止。delete from table1 d
where exists (
select *
from table1 t
where time between d.time-interval 1 hour and d.time
and not exists (
select 1 from table1
where time between t.time-interval 1 hour and t.time
)
)
是否可以这样理解，同1小时内取最小的时间？同一小时内取最小时间
那跨小时就可能出现问题
例如:
时间转换成时分秒格式方便观看
id   time
1    2012-12-25 09:50:11
2    2012-12-25 10:30:11  *
3    2012-12-25 10:40:11  *
4    2012-12-25 11:20:11
SELECT @pre :=0;
DELETE FROM table WHERE id IN (
SELECT DISTINCT id
FROM (
SELECT if( time - @pre >3600, 0, id ) AS id, if( time - @pre >3600, @pre := time, 0 )
FROM table
WHERE 1
ORDER BY time ASC
) t
)
-------------------------------------
上面的sql可以实现了,,,,
怎么把赋值写到一句话里?
最里面的两个if, 现在显示两个字段所以上面再嵌套一层需要取出一个id比较麻烦,除了用concat以外的办法, 怎么才能不让后面的id出结果???
id   time
1    2012-12-25 09:10:11
2    2012-12-25 10:30:11
3    2012-12-25 10:40:11
4    2012-12-25 11:20:11
2、3还算重复？SELECT @PRE,IF( `time` - @pre >3600, 0, id ) AS id, IF( `time` - @pre >3600, @pre:=`time`, 0 )
FROM tTQ,(SELECT @pre :='2012-1-1 00:00:01' ) B
WHERE 1
ORDER BY `time` ASC
另外  SELECT @PRE ,IF( ...   这样写貌似不行, 至少我这里不能通过   mysql5.1就是要个
SELECT if(@pre is null,@pre:=0,@pre:=@pre), if(............
这样的效果, 怎么mysql不支持那,郁闷啊
SELECT @PRE,IF( `time` - @pre >3600, 0, id ) AS id, IF( `time` - @pre >3600, @pre:=`time`, 0 )
  FROM tTQ,(SELECT @pre :='2012-1-1 00:00:01' ) B
  WHERE 1
  ORDER BY `time` ASC
在5.5下测试通过
1000万条记录  group by concat(a,b,c)  查询需要6-7分钟,  怎么才能加速一点add index abc(a,b,c) 的索引已经建立了
id   time
  1    2012-12-25 09:25:11
  2    2012-12-25 10:30:11
  3    2012-12-25 10:40:11
  4    2012-12-25 11:20:11
要求结果是什么
id   time
1    2012-12-25 09:25:11
2    2012-12-25 10:30:11
3    2012-12-25 10:40:11  * 与id=2 比不足一小时,重复
4    2012-12-25 11:20:11  * 与id=2 比不足一小时,重复和上一条记录时间相差1小时内即为重复数据, 上一条为不重复的数据
这...这是mysql的吗? 不能执行啊, Mysql 5.1    另外time字段为标准的 unix时间戳
mysql> select * from table1;
+----+---------------------+
| id | time                |
+----+---------------------+
|  1 | 2012-12-25 10:10:11 |
|  2 | 2012-12-25 10:30:11 |
|  3 | 2012-12-25 10:40:11 |
|  4 | 2012-12-25 11:20:11 |
+----+---------------------+
4 rows in set (0.00 sec)mysql> select version();
+------------------+
| version()        |
+------------------+
| 5.1.58-community |
+------------------+
1 row in set (0.00 sec)mysql> delete x from table1 x ,(
    ->  select id from table1 d
    ->  where exists (
    ->          select 1
    ->          from table1 t
    ->          where time > d.time-interval 1 hour and time < d.time
    ->          and not exists (
    ->                  select 1 from table1
    ->                  where time > t.time-interval 1 hour and time < t.time
    ->          )
    ->  )
    -> ) y
    -> where x.id=Y.id;
Query OK, 2 rows affected (0.00 sec)mysql> select ROW_COUNT();
+-------------+
| ROW_COUNT() |
+-------------+
|           2 |
+-------------+
1 row in set (0.00 sec)mysql> delete x from table1 x ,(
    ->  select id from table1 d
    ->  where exists (
    ->          select 1
    ->          from table1 t
    ->          where time > d.time-interval 1 hour and time < d.time
    ->          and not exists (
    ->                  select 1 from table1
    ->                  where time > t.time-interval 1 hour and time < t.time
    ->          )
    ->  )
    -> ) y
    -> where x.id=Y.id;
Query OK, 0 rows affected (0.00 sec)mysql> select ROW_COUNT();
+-------------+
| ROW_COUNT() |
+-------------+
|           0 |
+-------------+
1 row in set (0.00 sec)mysql> select * from table1;
+----+---------------------+
| id | time                |
+----+---------------------+
|  1 | 2012-12-25 10:10:11 |
|  4 | 2012-12-25 11:20:11 |
+----+---------------------+
2 rows in set (0.00 sec)mysql>
try:
SELECT `time`,@a:= `time`,@b:=1 FROM ttq LIMIT 1;
SELECT *,TIMEDIFF(`time`,@a),
@b:=IF(TIME_TO_SEC(TIMEDIFF(`time`,@a))>=3600 AND HOUR(@b)<>HOUR(`time`),1,0) AS bz,
@a:=IF(@b=0,@a,`time`) AS bz2
FROM ttq LIMIT 1,10000BZ为1满足条件