Application 偶尔会发生死锁的情况,以前就做了一些分析,初步定位了两段有冲突的地方,今天特意做了一下试验,并把DBCC监控死锁的命令打开,果然发现是两段SQL同时执行可能会发生冲突,但分析了半天,仍没有找出死锁的原因,请大家来看看,并说说怎么设计来避免。通过Application 加锁的方式来规避也是可以的,但改动要麻烦点,看通过SQLServer的索引设计或者写法的改变是否可以避免这个问题.
Query1 第一个connection:
在一个查询的Connection中查询多个备件,但其中是没有事务的.
waitfor DELAY '00:00:05'WHILE (1=1)
BEGIN
select pcp.* from PartCountryPlannerReviewReason pcp
inner join ValidationReviewReason vrr on pcp.ReviewReasonID = vrr.ID
WHERE pcp.ChubID = 1 AND pcp.Material in
('620893-001','670539-001','671352-001','608150-001','681958-001','605961-001','604054-001','657095-601','613458-001')
AND pcp.ReviewedOn is null AND vrr.CanBeBatchReviewed = 1END
query1 执行计划:
select pcp.* from PartCountryPlannerReviewReason pcp inner join ValidationReviewReason vrr on pcp.ReviewReasonID = vrr.ID WHERE pcp.ChubID = 1 AND pcp.Material in ('620893-001','670539-001','671352-001','608150-001','681958-001','605961-001','604054-001','657095-601','613458-001') AND pcp.ReviewedOn is null AND vrr.CanBeBatchReviewed = 1
|--Nested Loops(Inner Join, OUTER REFERENCES:([pcp].[ReviewReasonID]))
|--Clustered Index Scan(OBJECT:([AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[PK_PartCountryPlannerReviewReason] AS [pcp]), WHERE:([AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[ChubID] as [pcp].[ChubID]=(1) AND [AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[ReviewedOn] as [pcp].[ReviewedOn] IS NULL AND ([AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Material] as [pcp].[Material]=N'604054-001' OR [AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Material] as [pcp].[Material]=N'605961-001' OR [AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Material] as [pcp].[Material]=N'608150-001' OR [AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Material] as [pcp].[Material]=N'613458-001' OR [AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Material] as [pcp].[Material]=N'620893-001' OR [AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Material] as [pcp].[Material]=N'657095-601' OR [AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Material] as [pcp].[Material]=N'670539-001' OR [AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Material] as [pcp].[Material]=N'671352-001' OR [AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Material] as [pcp].[Material]=N'681958-001')))
|--Clustered Index Seek(OBJECT:([AMEETOT3Trans].[dbo].[ValidationReviewReason].[PK_ValidationReviewReason] AS [vrr]), SEEK:([vrr].[ID]=[AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[ReviewReasonID] as [pcp].[ReviewReasonID]), WHERE:([AMEETOT3Trans].[dbo].[ValidationReviewReason].[CanBeBatchReviewed] as [vrr].[CanBeBatchReviewed]=(1)) ORDERED FORWARD)Query 2 第二个Connnection:
WHILE(1=1)
BEGIN
UPDATE PRR
SET Re = 'XXX test'
FROM PartCountryPlannerReviewReason PRR
inner join InventorySafetyRop isr on prr.ChubID = isr.CHUBID AND PRR.Material = isr.Material
WHERE PRR.ChubID = 1 AND isr.IsValidated = 1
ENDquery2 执行计划:UPDATE PRR SET Re = 'XXX test' FROM PartCountryPlannerReviewReason PRR inner join InventorySafetyRop isr on prr.ChubID = isr.CHUBID AND PRR.Material = isr.Material WHERE PRR.ChubID = 1 AND isr.IsValidated = 1
|--Clustered Index Update(OBJECT:([AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[PK_PartCountryPlannerReviewReason] AS [PRR]), SET:([AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Re] as [PRR].[Re] = [Expr1006]))
|--Compute Scalar(DEFINE:([Expr1006]=N'XXX test'))
|--Top(ROWCOUNT est 0)
|--Parallelism(Gather Streams)
|--Hash Match(Left Semi Join, HASH:([PRR].[Material])=([isr].[Material]), RESIDUAL:([AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[Material] as [PRR].[Material]=[AMEETOT3Trans].[dbo].[InventorySafetyRop].[Material] as [isr].[Material]))
|--Bitmap(HASH:([PRR].[Material]), DEFINE:([Bitmap1013]))
| |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([PRR].[Material]))
| |--Clustered Index Scan(OBJECT:([AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[PK_PartCountryPlannerReviewReason] AS [PRR]), WHERE:([AMEETOT3Trans].[dbo].[PartCountryPlannerReviewReason].[ChubID] as [PRR].[ChubID]=(1)) ORDERED)
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([isr].[Material]), WHERE:(PROBE([Bitmap1013],[AMEETOT3Trans].[dbo].[InventorySafetyRop].[Material] as [isr].[Material])))
|--Hash Match(Inner Join, HASH:([isr].[CHUBID], [Uniq1005])=([isr].[CHUBID], [Uniq1005]), RESIDUAL:([AMEETOT3Trans].[dbo].[InventorySafetyRop].[CHUBID] as [isr].[CHUBID] = [AMEETOT3Trans].[dbo].[InventorySafetyRop].[CHUBID] as [isr].[CHUBID] AND [Uniq1005] = [Uniq1005]))
|--Bitmap(HASH:([isr].[CHUBID], [Uniq1005]), DEFINE:([Bitmap1012]))
| |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([isr].[CHUBID], [Uniq1005]))
| |--Index Seek(OBJECT:([AMEETOT3Trans].[dbo].[InventorySafetyRop].[IX_InventorySafetyRop_IsValidated] AS [isr]), SEEK:([PtnId1002]=(2) AND [isr].[IsValidated]=(1) AND [isr].[CHUBID]=(1)) ORDERED FORWARD)
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([isr].[CHUBID], [Uniq1005]))
|--Index Scan(OBJECT:([AMEETOT3Trans].[dbo].[InventorySafetyRop].[IX_InventorySafetyRop_Material] AS [isr]), SEEK:([PtnId1002]=(2)), WHERE:([AMEETOT3Trans].[dbo].[InventorySafetyRop].[CHUBID] as [isr].[CHUBID]=(1) AND PROBE([Bitmap1012],[AMEETOT3Trans].[dbo].[InventorySafetyRop].[CHUBID] as [isr].[CHUBID],[Uniq1005])) ORDERED FORWARD)几秒钟后出现死锁下面贴图:
1,PartCountryReviewReason table structure:
看死锁报错主要是PartCountryReviewReason这张表的主键ID死锁,所以现在只把这个表的Structure贴了上来,如有需要也可以帖其它的表结构.
2,deadlock sqlserver information:SQLSERVERDeadlock
SELECT 锁定了一些聚焦索引的ID(共享锁)
UPDATE 锁定了另外一些聚焦锁引的ID(X锁),正准备去申请SELECT 锁定的一些ID,但这些ID正在被SELECT锁住,而SELECT又准备去申请锁Update锁定的一些ID,所以导致的死锁。
但这一理论是基于SELECT和我UPDATE都是按行和页去锁定的,这貌似又不对。
Log SQL Server (Current - 3/9/2013 8:14:00 PM)Source spid16sMessage
deadlock victim=process643a748而且SQL Management也显示了出错的信息,我没有截图上来。
但是假设查询较慢时,查询id=1、2、3查询到每一条数据要用时1分钟
另一个事务中update id=3、2、1时,假设update2、3时持有x锁,
而当update1时,查询所持有的锁导致阻塞,update进程等待
而当查询进行到2、3时,由于update所持有的x锁导致了查询等待
这样就构成了死锁的条件
一般较复杂的查询可以用(nolock)提示让其不加共享锁查询,这种做法需要你的业务上对数据要求不敏感,因为nolock提示可能会产生脏读数据
还有就是update的时候提示rowlock,不过数据量过大时还是会导致锁升级
我之前研究过,update时尽量用聚集索引做筛选条件,这样资源锁定的比较精准,不会或者较少锁住数据页中的其他数据行,尽量缩短和减少资源锁定
不过这种死锁牺牲的一般都是查询进程,对业务影响应该不大
其实我看最有效的就是nolock了,cte关于update的情况我还在研究
WHILE(1=1)
BEGIN
--BEGIN TRY
--BEGIN TRAN
UPDATE PRR
SET Re = 'XXX test'
FROM PartCountryPlannerReviewReason PRR with(TABLOCK)
inner join InventorySafetyRop isr on prr.ChubID = isr.CHUBID AND PRR.Material = isr.Material
WHERE PRR.ChubID = 1 AND isr.IsValidated = 1
END在Update的时候用表级锁就可以了。
加之后能解决分析如下:SELECT CHUBID是非聚集索引,在查找其它字段时,仍然需要用到聚集索引,在SELECT 出来1,2时 正准备申请3的S锁,而这时可能3正在被Update 修改,修改数据时聚集索引总时会被更新,这通过执行计划可以看到,形成阻塞,而这是Update完3需要申请1,2的X锁时,发现SELECT上的S锁正占用,所以开成死锁。与你上面的分析一致。但加之后呢,就不存在这个问题了,因为Update是要加表级修改锁Tablock是根据SELECT或者Update/Insert/Delete决定加锁的类型的,这时候发现SELECT已加S锁,所以加不上,这时候就形成阻塞,那就等SELECT完成后,再做Update,再做Update时, SELECT也不能在表U锁上加S锁,所以也是阻塞。
这两个地方要用到聚集索引扫描到是可以理解的,因为要去SELECT或者Update到其它未加索引的行,最张都要靠索引来定位。
这个这么做应该会更好,但application 中用dataentity 来SELECT加一个nolock hint语句还真不容易,所以就准备还是放到后台usp中的update中来改。
这个不要考虑WHILE (1=1),只是想说明现某一次很偶然的机会中Update会和SELECT发生冲突,实际应用中肯定不会这么写.另外分布式应用程序中并发情况很常见的,Update和SELECT同时发生也不奇怪,只是加锁的地方可以探讨,原来用乐观锁方式,认为SQLSERVER自己会去处理,但从这个地方看来或者这种情况,必须程序员来考虑
这个有时确实是需要程序员在设计的时候取考虑的,一般在容易产生死锁的地方的程序要顺序执行
就像我上面举的例子,查询按照123的顺序,update安装321的顺序,这样便有机会产生死锁
如果查询123而update也123,那肯定是发生的阻塞,从而避免死锁