问题求解：在MYSQL中，对表的一列数字进行分组循环累加

CREATE TABLE test
(
Id INT NOT NULL AUTO_INCREMENT,
X1 INT NOT NULL,
X2 INT NOT NULL,
PRIMARY KEY(`Id`)
)INSERT INTO test(X1,X2) VALUES(2,453),(2,120),(2,101),(2,89),(3,111),(3,421),(3,219)SELECT * FROM test
SELECT X1,(SELECT SUM(X2) FROM test WHERE X1=A.X1 AND Id<=A.Id)
FROM test A-----------------------X1 (SELECT SUM(X2) from test where X1=A.X1 and Id<=A.Id)
2 453
2 573
2 674
2 763
3 111
3 532
3 751

经测试代码有效，谢谢yananguo_1985！也感谢rucypli和ACMAIN_CHM的解释！原先我在MS SQL中，这个累加表是从一个原始表中抽出经过排序得到的，当时没有考虑到自增id列的问题（也没有用到ROW_NUMBER()）。我学了一段简单的代码，用变量解决了，不料换到MYSQL环境中，UPDATE SET 后面不能直接跟变量。

SET @num=0;
SET @num1=0;
SELECT a.x1,a.x2,a.x2+IFNULL(SUM(b.x2),0) FROM (
SELECT *,@num:=@num+1 AS pm FROM temptb) a
LEFT JOIN
(
SELECT *,@num1:=@num1+1 AS pm FROM temptb) b
ON a.x1=b.x1 AND a.pm>b.pm
GROUP BY a.x1,a.x2 ORDER BY a.pm

追加一个问题：
有自增id列的情况下，能否用UPDATE SET将结果直接写到原始表格中。得到
ID x1 x2 CUM_SUM我先自己试着写一写，实在没悟性再回来看大家的答案。

汗，贴没回完又一个答案就出来了……我自己再尝试看能不能用UPDATE写出来吧。经测试代码有效！感谢WWWWA代码运行结果：+------+------+--------------------------+
| x1   | x2   | a.x2+IFNULL(SUM(b.x2),0) |
+------+------+--------------------------+
|    2 |  453 |                      453 |
|    2 |  120 |                      573 |
|    2 |  101 |                      674 |
|    2 |   89 |                      763 |
|    3 |  111 |                      111 |
|    3 |  421 |                      532 |
|    3 |  219 |                      751 |
+------+------+--------------------------+
7 rows in set (0.02 sec)

加一个ORDER BY，绕过了自增ID列，效果同我原先在MS SQL

1 楼 yananguo_1985 的回复:
mysql> SELECT id,X1,x2,(SELECT SUM(X2) FROM test WHERE X1=A.X1 AND x2<=A.x2)
    -> FROM test A
    -> ;
+----+----+-----+-------------------------------------------------------+
| id | X1 | x2  | (SELECT SUM(X2) FROM test WHERE X1=A.X1 AND x2<=A.x2) |
+----+----+-----+-------------------------------------------------------+
|  1 |  2 | 453 |                                                   763 |
|  2 |  2 | 120 |                                                   310 |
|  3 |  2 | 101 |                                                   190 |
|  4 |  2 |  89 |                                                    89 |
|  5 |  3 | 111 |                                                   111 |
|  6 |  3 | 421 |                                                   751 |
|  7 |  3 | 219 |                                                   330 |
+----+----+-----+-------------------------------------------------------+
7 rows in set (0.00 sec)
加了 ORDER BY 的结果：mysql> SELECT id,X1,x2,(SELECT SUM(X2) FROM test WHERE X1=A.X1 AND x2<=A.x2) FROM test A order by x1,x2;
+----+----+-----+-------------------------------------------------------+
| id | X1 | x2  | (SELECT SUM(X2) FROM test WHERE X1=A.X1 AND x2<=A.x2) |
+----+----+-----+-------------------------------------------------------+
|  4 |  2 |  89 |                                                    89 |
|  3 |  2 | 101 |                                                   190 |
|  2 |  2 | 120 |                                                   310 |
|  1 |  2 | 453 |                                                   763 |
|  5 |  3 | 111 |                                                   111 |
|  7 |  3 | 219 |                                                   330 |
|  6 |  3 | 421 |                                                   751 |
+----+----+-----+-------------------------------------------------------+
7 rows in set (0.00 sec)

我错了，在8楼的说法不对，rucypli和ACMAIN_CHM所说的才是对的（有自增ID列才能知道在相同编号下，哪条记录是先，哪条是后）。
即使在这个简单例子中运行结果中是正确的，到一个有重复行的数据表中结果就会出错。

不使用UPDATE+变量而用SELECT的方法有严重的性能问题，一个百万行的表照上面的SELECT代码，几乎运行不动。而且我需要将累加的结果写回原始表格中。继续就这个问题求教于各位！

搜了好几天终于找到了一个解决方案，用了游标和储存过程，但相比较MSSQL下的语句实在是麻烦多了。
而且“You need one INSERT statement per grouping value. ”
出自：http://www.artfulsoftware.com/infotree/queries.php#104
Within-group quotas (Top N per group)A table has multiple rows per key value, and you need to retrieve, say, the first or earliest two rows per key. For example:DROP TABLE IF EXISTS test;
CREATE TABLE test( id INT, entrydate DATE );
INSERT INTO test VALUES
( 1, '2007-5-01' ),( 1, '2007-5-02' ),( 1, '2007-5-03' ),( 1, '2007-5-04' ),
( 1, '2007-5-05' ),( 1, '2007-5-06' ),( 2, '2007-6-01' ),( 2, '2007-6-02' ),
( 2, '2007-6-03' ),( 2, '2007-6-04' ),( 3, '2007-7-01' ),( 3, '2007-7-02' ),
( 3, '2007-7-03' ); One approach is to rank rows with user variables and pick off the top two for each key in the WHERE clause: SELECT tmp.ID, tmp.entrydate
FROM (
  SELECT
    ID, entrydate,
    IF( @prev <> ID, @rownum := 1, @rownum := @rownum+1 ) AS rank,
    @prev := ID
  FROM test t
  JOIN (SELECT @rownum := NULL, @prev := 0) AS r
  ORDER BY t.ID
) AS tmp
WHERE tmp.rank <= 2
ORDER BY ID, entrydate;
+------+------------+
| ID   | entrydate  |
+------+------------+
|    1 | 2007-05-01 |
|    1 | 2007-05-02 |
|    2 | 2007-06-03 |
|    2 | 2007-06-04 |
|    3 | 2007-07-01 |
|    3 | 2007-07-02 |
+------+------------+ This is pretty much the same query pattern as the user variable method of emulating Row_Number(). The join in the subquery is just a device for resetting the variables after reading a row. If the groups are fairly small, another feasible approach is to self-join and count. With appropriate ordering, the first two rows per ID are the rows which, for a given ID, have two or fewer rows with earlier dates. If we use an inequality join with the COUNT(*) function to find the earlier rows per ID ...SELECT t1.id, t1.entrydate, COUNT(*) AS earlier
FROM test AS t1
JOIN test AS t2 ON t1.id=t2.id AND t1.entrydate >= t2.entrydate
GROUP BY t1.id, t1.entrydate
+------+------------+---------+
| id   | entrydate  | earlier |
+------+------------+---------+
|    1 | 2007-05-01 |       1 |
|    1 | 2007-05-02 |       2 |
|    1 | 2007-05-03 |       3 |
|    1 | 2007-05-04 |       4 |
|    1 | 2007-05-05 |       5 |
|    1 | 2007-05-06 |       6 |
|    2 | 2007-06-01 |       1 |
|    2 | 2007-06-02 |       2 |
|    2 | 2007-06-03 |       3 |
|    2 | 2007-06-04 |       4 |
|    3 | 2007-07-01 |       1 |
|    3 | 2007-07-02 |       2 |
|    3 | 2007-07-03 |       3 |
+------+------------+---------+ ... then we get our result by removing rows where the 'earlier' count exceeds 2:SELECT t1.id, t1.entrydate, count(*) AS earlier
FROM test AS t1
JOIN test AS t2 ON t1.id=t2.id AND t1.entrydate >= t2.entrydate
GROUP BY t1.id, t1.entrydate
HAVING earlier <= 2;
+------+------------+---------+
| id   | entrydate  | earlier |
+------+------------+---------+
|    1 | 2007-05-01 |       1 |
|    1 | 2007-05-02 |       2 |
|    2 | 2007-06-01 |       1 |
|    2 | 2007-06-02 |       2 |
|    3 | 2007-07-01 |       1 |
|    3 | 2007-07-02 |       2 |
+------+------------+---------+ This is about as efficient as the first method with a small table, but it compares every within-group row to every other within-group row. As the size N of a group increases, execution time increases by N*N. If the query takes one minute for groups of 1,000, it will take 16 minutes for groups of 4,000, and more than four hours for groups for 16,000. The solution does not scale.What to do? Forget GROUP BY! Manually assemble the desired query results in a temporary table from simple indexed queries, in this case, two rows per ID:DROP TEMPORARY TABLE IF EXISTS earliers;
CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE);
INSERT INTO earliers
  SELECT id,entrydate FROM test WHERE id=1 ORDER BY entrydate LIMIT 2;
INSERT INTO earliers
  SELECT id,entrydate FROM test WHERE id=2 ORDER BY entrydate LIMIT 2;
INSERT INTO earliers
  SELECT id,entrydate FROM test WHERE id=3 ORDER BY entrydate LIMIT 2; You need one INSERT statement per grouping value. To print the result, just query the earliers table:SELECT * FROM earliers
ORDER BY id, entrydate;
+------+------------+
| id   | entrydate  |
+------+------------+
|    1 | 2007-05-01 |
|    1 | 2007-05-02 |
|    2 | 2007-06-01 |
|    2 | 2007-06-02 |
|    3 | 2007-07-01 |
|    3 | 2007-07-02 |
+------+------------+
DROP TEMPORARY TABLE earliers; Most useful reports run again and again. If that's the case for yours, automate it in a stored procedure: using a cursor and a prepared statement, auto-generate an INSERT statement for every grouping value, and return the result:DROP PROCEDURE IF EXISTS listearliers;
DELIMITER |
CREATE PROCEDURE listearliers()
BEGIN
  DECLARE curdone, vid INT DEFAULT 0;
  DECLARE idcur CURSOR FOR SELECT DISTINCT id FROM test;
  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET curdone = 1;
  DROP TEMPORARY TABLE IF EXISTS earliers;
  CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE);
  SET @sql = 'INSERT INTO earliers SELECT id,entrydate FROM test WHERE id=? ORDER BY  entrydate LIMIT 2';
  OPEN idcur;
  REPEAT
    FETCH idcur INTO vid;
    IF NOT curdone THEN
      BEGIN
        SET @vid = vid;
        PREPARE stmt FROM @sql;
        EXECUTE stmt USING @vid;
        DROP PREPARE stmt;
      END;
    END IF;
  UNTIL curdone END REPEAT;
  CLOSE idcur;
  SELECT * FROM earliers ORDER BY id,entrydate;
  DROP TEMPORARY TABLE earliers;
END;
|
DELIMITER ;
CALL listearliers(); Last updated 04 May 2010

调试易

问题求解：在MYSQL中，对表的一列数字进行分组循环累加

解决方案 »