最近做项目时碰到一个问题。。
比如下面一段SQL。。select yy,SUM(xx)as zz
from tableA as a
where exists (select 1 from b where data <='2010-01-01' and data <='2011-12-12' and ID=a.ID )
group by yy order by zz desc 大概就是这样一条语句查出来不到100w条数据。很慢,网上说in会导致全表查询什么的,由于实在是对数据库理解的不是很深入。求高人给看看,该怎么优化,小弟不胜感激。
比如下面一段SQL。。select yy,SUM(xx)as zz
from tableA as a
where exists (select 1 from b where data <='2010-01-01' and data <='2011-12-12' and ID=a.ID )
group by yy order by zz desc 大概就是这样一条语句查出来不到100w条数据。很慢,网上说in会导致全表查询什么的,由于实在是对数据库理解的不是很深入。求高人给看看,该怎么优化,小弟不胜感激。
之前的分析,参考http://blog.csdn.net/orchidcat/article/details/6267552
USE master
-- 备份
BACKUP DATABASE AdventureWorks
TO DISK = 'AdventureWorks.bak'
WITH FORMAT---- 恢复
--RESTORE DATABASE AdventureWorks
-- FROM DISK = 'AdventureWorks.bak'
-- WITH REPLACE
GO--=========================================
-- 转换为分区表
--=========================================
USE AdventureWorks
GO-- 1. 创建分区函数
-- a. 适用于存储历史存档记录的分区表的分区函数
DECLARE @dt datetime
SET @dt = '20020101'
CREATE PARTITION FUNCTION PF_HistoryArchive(datetime)
AS RANGE RIGHT
FOR VALUES(
@dt,
DATEADD(Year, 1, @dt))-- 2. 创建分区架构
-- a. 适用于存储历史存档记录的分区表的分区架构
CREATE PARTITION SCHEME PS_HistoryArchive
AS PARTITION PF_HistoryArchive
TO([PRIMARY], [PRIMARY], [PRIMARY])
-- 3. 删除索引
-- a. 删除存储历史存档记录的表中的索引
DROP INDEX Production.TransactionHistoryArchive.IX_TransactionHistoryArchive_ProductID
DROP INDEX Production.TransactionHistoryArchive.IX_TransactionHistoryArchive_ReferenceOrderID_ReferenceOrderLineID-- 4. 转换为分区表
-- a. 将存储历史存档记录的表转换为分区表
ALTER TABLE Production.TransactionHistoryArchive
DROP CONSTRAINT PK_TransactionHistoryArchive_TransactionID
WITH(
MOVE TO PS_HistoryArchive(TransactionDate))-- 5. 恢复主键
-- a. 恢复存储历史存档记录的分区表的主键
ALTER TABLE Production.TransactionHistoryArchive
ADD CONSTRAINT PK_TransactionHistoryArchive_TransactionID
PRIMARY KEY CLUSTERED(
TransactionID,
TransactionDate)-- 6. 恢复索引
-- a. 恢复存储历史存档记录的分区表的索引
CREATE INDEX IX_TransactionHistoryArchive_ProductID
ON Production.TransactionHistoryArchive(
ProductID)CREATE INDEX IX_TransactionHistoryArchive_ReferenceOrderID_ReferenceOrderLineID
ON Production.TransactionHistoryArchive(
ReferenceOrderID,
ReferenceOrderLineID)-- 7. 查看分区表的相关信息
SELECT
SchemaName = S.name,
TableName = TB.name,
PartitionScheme = PS.name,
PartitionFunction = PF.name,
PartitionFunctionRangeType = CASE
WHEN boundary_value_on_right = 0 THEN 'LEFT'
ELSE 'RIGHT' END,
PartitionFunctionFanout = PF.fanout,
SchemaID = S.schema_id,
ObjectID = TB.object_id,
PartitionSchemeID = PS.data_space_id,
PartitionFunctionID = PS.function_id
FROM sys.schemas S
INNER JOIN sys.tables TB
ON S.schema_id = TB.schema_id
INNER JOIN sys.indexes IDX
on TB.object_id = IDX.object_id
AND IDX.index_id < 2
INNER JOIN sys.partition_schemes PS
ON PS.data_space_id = IDX.data_space_id
INNER JOIN sys.partition_functions PF
ON PS.function_id = PF.function_id
GO
还有in是不是一定比exists快?存不存在内部是大表外面是小表用in更快一点?求解。。
exists是判断真假,in是比较值,在运算时相差无几,exists通常情况占优一些,具体要看一下执行计划
比如not exists同not in,not in处理不了null值判断,但not exists可以你的情况100W数据查询用分区表就行了,分区表1000W都可勉强支持
你的查询速度应该可以提高以倍计算
from tableA as a inner join b on b.id=a.id
where b.date >='2010-01-01' and b.date <='2011-12-12'
group by a.yy
order by zz desc
-- 1. 首先保证a表的yy列有索引;
-- 2. 先保证 b上的列data列有索引,id列有索引-- 然后改成如下试试看
select yy,SUM(xx)as zz
from tableA as a inner join
( select 1 from b where data <='2010-01-01' and data <='2011-12-12' ) as t2
on a.id = t.id
group by yy order
by zz desc-- 另外你查询的数据如果接近100w的话,sqlserver需要把数据展示在表格里,这部的工作可能比查询的工作还大,所以你应该避免一次性返回这么大的数据量,尽量把数据范围缩小。