如何高效地从相似记录中，赛选出以某一列最大值为准的所有记录

如何高效地从相似记录中，赛选出以某一列最大值为准的所有记录在实际业务中，经常需要从相似记录中赛选出一条记录，如所有客户最近一次的购买金额、
每种产品最近一次销售数量，等等。可能不好理解，以下面测试表为例：create table testgp (
id int identity (1, 1) primary key not null ,
customer int not null ,
amount int not null ,
)insert testgp values(8, 10);
insert testgp values(13, 20);
insert testgp values(17, 15);
insert testgp values(22, 30);
insert testgp values(8, 5);
insert testgp values(13, 36);
insert testgp values(17, 20);
insert testgp values(22, 11);
insert testgp values(18, 12);
insert testgp values(13, 19);
insert testgp values(27, 20);
insert testgp values(22, 10);-- 查询出所有记录，并排序看看
select customer, id, amount from testgp order by customer, id
/*
customer    id          amount
----------- ----------- -----------
8           1           10
8           5           5  <------------只需要这行
13          2           20
13          6           36
13          10          19 <------------只需要这行
17          3           15
17          7           20 <------------只需要这行
18          9           12 <------------只需要这行
22          4           30
22          8           11
22          12          10 <------------只需要这行
27          11          20 <------------只需要这行
(12 行受影响)
*/
对于每一个 customer，可能存在多个 amount，但是只想要 id 最大的那个 amount我们可以用下面的 SQL 语句来查询：select l.customer, l.id, l.amount from testgp l inner join (
select max(id) id from testgp s group by customer) r on r.id = l.id
/*
customer    id          amount
----------- ----------- -----------
8           5           5
13          10          19
17          7           20
18          9           12
22          12          10
27          11          20
(6 行受影响)
*/
当然也可以用这个
select customer, max(id) id, (select amount from testgp where id = max(s.id)) from testgp s group by customer
/*
customer    id
----------- ----------- -----------
8           5           5
13          10          19
17          7           20
18          9           12
22          12          10
27          11          20
(6 行受影响)
*/结果没错，是想要的，可是不觉得速度方面有点欠缺吗？我是觉得不太高效了。所以，有没可以高效点呢？最好是只扫描一次表格，因为实际业务中，testgp 就是一个有多个表 JOIN 的子查询，怎么叫只扫描一次表格呢？例如：select customer, max(id) id from testgp s group by customer可是这样，少了amount，所以应该是：select customer, max(id) id, xxxx(....) from testgp s group by customer问题是 SQL 中有这个 xxxx 函数可以取得 max(id) 对应得 amount 吗？我找不出来，或者你可以用其他方法，不用 GROUP BY，但是只能扫描一次表格。

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

select customer,
       id,
       amount
from testgp t
where id=(select max(ID)
          from testgp
          where customer=t.customer)
select customer,id,amount from (
select * ,rn=row_number() over(partition by customer order by id desc) from testgp
) a
where rn=1customer    id          amount
----------- ----------- -----------
8           5           5
13          10          19
17          7           20
18          9           12
22          12          10
27          11          20(6 行受影响)
create table testgp (
    id int identity (1, 1) primary key not null ,
    customer int not null ,
    amount int not null ,
)insert testgp values(8, 10);
insert testgp values(13, 20);
insert testgp values(17, 15);
insert testgp values(22, 30);
insert testgp values(8, 5);
insert testgp values(13, 36);
insert testgp values(17, 20);
insert testgp values(22, 11);
insert testgp values(18, 12);
insert testgp values(13, 19);
insert testgp values(27, 20);
insert testgp values(22, 10);
select customer,
       id,
       amount
from testgp t
where id=(select max(ID)
          from testgp
          where customer=t.customer)
order by customer
drop table testgp
/*
customer    id          amount
----------- ----------- -----------
8           5           5
13          10          19
17          7           20
18          9           12
22          12          10
27          11          20(6 個資料列受到影響)
*/
select l.customer, l.id, l.amount from testgp l inner join (
    select max(id) id from testgp s group by customer) r on r.id = l.id你这句如果ID有索引，应该算高效了。
select * from testgp  a where not exists(select 1 from testgp
where a.customer=customer and a.id<id)id          customer    amount
----------- ----------- -----------
5           8           5
7           17          20
9           18          12
10          13          19
11          27          20
12          22          10(6 行受影响)
select
  *
from
  tb t
where
  amount=(select max(amount) from tb where customer=t.customer)
不加索引的境况下。看了一下执行计划select * from testgp  a where not exists(select 1 from testgp
where a.customer=customer and a.id<id)
这条语句的消耗应该最少的
那是你的幻觉。。既然你说的是不加索引的境况下，那么少量的数据当然不能看出问题。试试
insert testgp values(8, 10);
insert testgp values(13, 20);
insert testgp values(17, 15);
insert testgp values(22, 30);
insert testgp values(8, 5);
insert testgp values(13, 36);
insert testgp values(17, 20);
insert testgp values(22, 11);
insert testgp values(18, 12);
insert testgp values(13, 19);
insert testgp values(27, 20);
insert testgp values(22, 10);
GO  26915
select l.customer, l.id, l.amount from testgp l inner join (
    select max(id) id from testgp s group by customer) r on r.id = l.id这个还比2#好.LZ没必要纠结于多扫描一次表，你的id是pk,以id值多seek一次表其实也就多2~3个I/O而已。
我为什么要纠结只扫描一次表格呢?帖子里说了所以，有没可以高效点呢？最好是只扫描一次表格，因为实际业务中，testgp 就是一个有多个表 JOIN 的子查询，
学会了2楼的查询法,以前我用over(partition by customer order by id)试了,但row_number对应的值都不一样,脑袋不拐弯,,结贴了