请教mysql多次分组的查询问题

请教mysql，不要存储过程，不要子查询，要join查询，如何一个sql语句查询出结果：首先分组cid和author字段，统计相同cid和author的数量的number数量；
然后取每个cid的number最多的前N条(比如N=2)author记录；原表：
+----+-------+--------+
| id |  cid  | author |
+----+-------+--------+
| 1  |   1   | test1  |
| 2  |   1   | test1  |
| 3  |   1   | test2  |
| 4  |   1   | test2  |
| 5  |   1   | test2  |
| 6  |   1   | test3  |
| 7  |   1   | test3  |
| 8  |   1   | test3  |
| 9  |   1   | test3  |
| 10 |   2   | test11 |
| 11 |   2   | test11 |
| 12 |   2   | test22 |
| 13 |   2   | test22 |
| 14 |   2   | test22 |
| 15 |   2   | test33 |
| 16 |   2   | test33 |
| 17 |   2   | test33 |
| 18 |   2   | test33 |比如：N=2，则期望结果为：
+-----+--------+--------+
| cid | author | number |
+-----+--------+--------+
|  1  | test3  |   4    |
|  1  | test2  |   3    |
|  2  | test33 |   4    |
|  2  | test22 |   3    |#测试数据
create table test (
  id int primary key,
  cid int,
  author char
) engine=myisam;insert into t2 values
(1,1,'test1'),
(2,1,'test1'),
(3,1,'test2'),
(4,1,'test2'),
(5,1,'test2'),
(6,1,'test3'),
(7,1,'test3'),
(8,1,'test3'),
(9,1,'test3'),
(10,2,'test11'),
(11,2,'test11'),
(12,2,'test22'),
(13,2,'test22'),
(14,2,'test22'),
(15,2,'test33'),
(16,2,'test33'),
(17,2,'test33'),
(18,2,'test33');

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

如果按照你的描述的话，不用子查询，不用join，那么exists可以用吗？
mysql> select * from (select cid,author,count(*) as number from t2 group by cid,author) a
where 2> (select count(*) from (select cid,author,count(*) as number from t2 group by cid,author) b where a.cid=b.cid and a.number<b.number) order by cid,number desc
+------+--------+--------+
| cid  | author | number |
+------+--------+--------+
|    1 | test3  |      4 |
|    1 | test2  |      3 |
|    2 | test33 |      4 |
|    2 | test22 |      3 |
+------+--------+--------+
4 rows in set (0.01 sec)
SELECT cid,author,count(*) AS C FROM test GROUP BY cid,author ORDER BY  C desc LIMIT 4
use test;
drop table if exists test;
create table test (
  id int primary key,
  cid int,
  author varchar(30)
) engine=myisam;
insert into test values
(1,1,'test1'),
(2,1,'test1'),
(3,1,'test2'),
(4,1,'test2'),
(5,1,'test2'),
(6,1,'test3'),
(7,1,'test3'),
(8,1,'test3'),
(9,1,'test3'),
(10,2,'test11'),
(11,2,'test11'),
(12,2,'test22'),
(13,2,'test22'),
(14,2,'test22'),
(15,2,'test33'),
(16,2,'test33'),
(17,2,'test33'),
(18,2,'test33');-- 为了不用子查询，不用存储过程，而且需要一条sql，我近我所能，写出了如下一条sql，不过效率吗，不是太高！
-- 如果楼主没有这么多限制，相信sql不用这么复杂的。
select c.*
from(
        select b.cid , max(b.author) author,max(b.n) mn
        from(
                SELECT cid, author,  count(*) n
                FROM `test`.`test`
                group by cid, author
        )b group by b.cid
        union all
        select b.cid , max(b.author),max(b.n) mn
        from(
                SELECT b2.cid, b2.author,  count(*) n
                FROM `test`.`test` b2
                where not exists(
                     select 1
                     from( select b1.cid , max(b1.author) author,max(b1.n) mn
                            from(
                                    SELECT cid, author,  count(*) n
                                    FROM `test`.`test`
                                    group by cid, author
                            )b1
                            group by b1.cid
                    )b11 where b11.cid=b2.cid and b11.author=b2.author
                )
                group by cid, author

        )b group by b.cid
)c order by c.cid, c.author desc-- 结果如下：'1', 'test3', '4'
'1', 'test2', '3'
'2', 'test33', '4'
'2', 'test22', '3'
这个也太取巧了吧，如果有3组呢，4组呢，5租呢，你是不是limit 4随时会变化啊！呵呵额！
这个不对啊，我这里查出来是：
'1', 'test2', '3'
'1', 'test1', '2'
'2', 'test22', '3'
'2', 'test11', '2'
mysql> use salo
Database changed
mysql> select a.* From (select cid,author,count(*) as number from test group by
cid,author)a where exists(select count(*) from (select cid,author,count(*) as nu
mber from test group by cid,author)b where b.cid=a.cid and b.number>a.number hav
ing count(*)<2)order by a.cid,a.number desc;
+------+--------+--------+
| cid  | author | number |
+------+--------+--------+
|    1 | test3  |      4 |
|    1 | test2  |      3 |
|    2 | test33 |      4 |
|    2 | test22 |      3 |
+------+--------+--------+
4 rows in set (0.00 sec)mysql>
子查询无法优化，因为子查询通常是表扫描的。
那就用临时表吧，你不可以用3条sql来跑吗？非得用一条sql来跑吗？
楼主在cid,author建立复合索引,再把最后的那个order by 取消掉看看.
还有，就是把mysql 的 my.cnf里面的参数 key_buffer_size与sort_buffer_size设置大一些，这样也会快很多。设置完了，mysql重启。
我将这个查询语句应用到实际表上面，100w数据，复合索引已经建了，直接将我的cpu100%了，久久不能任何操作，直接拔插头了。
悲剧了，你试试我的那个复杂的sql呢？
-- 用一些sql块，不管哪个应用，都可以执行连续的sql组成的sql块。你用下面的sql块，相信效率会大大提高的。
drop table if exists group_tmp;
create table group_tmp
select cid,author,count(*) as number from test group by cid,author;select *
from group_tmp g
where 2>(
    select count(1)
    from group_tmp g1
    where g.cid=g1.cid and g.number<g1.number
)order by cid,number desc ;
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 109240 Using where; Using filesort
3 DEPENDENT SUBQUERY <derived4> ALL NULL NULL NULL NULL 109240 Using where
4 DERIVED p index pChubanshe pChubanshe 608 NULL 713916 Using where; Using index; Using temporary; Using f...
2 DERIVED p index pChubanshe pChubanshe 608 NULL 713916 Using where; Using index; Using temporary; Using f...
SELECT a.cid,a.author,a.gs,COUNT(b.cid) AS ss FROM (
SELECT cid,author,COUNT(*) AS gs FROM t2 GROUP BY cid,author) a
inner JOIN
(
SELECT cid,author,COUNT(*) AS gs FROM t2 GROUP BY cid,author) b
ON a.cid=b.cid AND a.gs<=b.gs
GROUP BY a.cid,a.author,a.gs
HAVING COUNT(b.cid)<=2
ORDER BY cid,ss
试了子查询后join查询，效率不行，半天执行不完。
尝试用语句块的方式：drop table if exists group_tmp;
create table group_tmp select ...
#这里join查询
drop table if exists group_tmp;这样都不能支持上百万数据的查询，其实group_tmp只有10w+数据了。
不好意思，忘记了，group_tmp还需要建索引的。
-- 再试试看
drop table if exists group_tmp;
CREATE  TABLE `group_tmp` (
  `cid` INT NOT NULL ,
  `author` VARCHAR(45) NOT NULL ,
  `number` INT NULL ,
  PRIMARY KEY (`cid`, `author`) )
ENGINE = MyISAM
DEFAULT CHARACTER SET = utf8;
insert into test.group_tmp
select cid,author,count(*) as number from test group by cid,author;select *
from group_tmp g
where 2>(
    select count(1)
    from group_tmp g1
    where g.cid=g1.cid and g.number<g1.number
)order by cid,number desc ;
参考下贴中的多种方法http://topic.csdn.net/u/20091231/16/2f268740-391e-40f2-a15e-f243b2c925ab.html
[征集]分组取最大N条记录方法征集，及散分....
你这个里面的拜读了，不过一条sql真的搞不定他需要的东东的。