ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-12526) Ambari Cluster Deployment Stuck At 2% With A SQL Deadlock When Talking to SQL Azure
Date Fri, 24 Jul 2015 22:34:04 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641169#comment-14641169
] 

Hudson commented on AMBARI-12526:
---------------------------------

SUCCESS: Integrated in Ambari-branch-2.1 #271 (See [https://builds.apache.org/job/Ambari-branch-2.1/271/])
AMBARI-12526 - Ambari Cluster Deployment Stuck At 2% With A SQL Deadlock When Talking to SQL
Azure (jonathanhurley) (jhurley: http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=685b3dfe395fd94623c6e566e0bddfc1a20b584f)
* ambari-server/src/main/java/org/apache/ambari/server/state/Service.java
* ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
* ambari-server/src/main/java/org/apache/ambari/server/state/ServiceComponentImpl.java
* ambari-server/src/main/java/org/apache/ambari/server/state/Host.java
* ambari-server/src/main/java/org/apache/ambari/server/state/configgroup/ConfigGroupImpl.java
* ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java
* ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ServiceComponentHostConcurrentWriteDeadlockTest.java
* ambari-server/src/main/java/org/apache/ambari/server/state/ServiceComponentHost.java
* ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClustersImpl.java
* ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
* ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/HostVersionOutOfSyncListener.java
* ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostComponentStateDAO.java
* ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java
* ambari-server/src/main/java/org/apache/ambari/server/state/ServiceComponent.java


> Ambari Cluster Deployment Stuck At 2% With A SQL Deadlock When Talking to SQL Azure
> -----------------------------------------------------------------------------------
>
>                 Key: AMBARI-12526
>                 URL: https://issues.apache.org/jira/browse/AMBARI-12526
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.1.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Blocker
>             Fix For: 2.1.1
>
>         Attachments: AMBARI-12526.patch
>
>
> When deploying a new cluster on SQL Azure, there is a recurring deadlock on the SQL Server:
> {code}
> 15 Jul 2015 22:13:31,453 ERROR [ambari-action-scheduler] AmbariJpaLocalTxnInterceptor:114
- [DETAILED ERROR] Rollback reason: 
> Local Exception Stack: 
> Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd):
org.eclipse.persistence.exceptions.DatabaseException
> Internal Exception: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process
ID 62) was deadlocked on lock resources with another process and has been chosen as the deadlock
victim. Rerun the transaction.
> Error Code: 1205
> Call: UPDATE hostcomponentstate SET current_state = ? WHERE ((((component_name = ?) AND
(host_id = ?)) AND (cluster_id = ?)) AND (service_name = ?))
> 	bind => [5 parameters bound]
> 	at org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:331)
> 	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:900)
> 	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeNoSelect(DatabaseAccessor.java:962)
> 	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:631)
> 	at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatch(ParameterizedSQLBatchWritingMechanism.java:149)
> 	at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatchedStatements(ParameterizedSQLBatchWritingMechanism.java:134)
> 	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.writesCompleted(DatabaseAccessor.java:1836)
> 	at org.eclipse.persistence.internal.sessions.AbstractSession.writesCompleted(AbstractSession.java:4244)
> 	at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.writesCompleted(UnitOfWorkImpl.java:5594)
> 	at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.writeChanges(RepeatableWriteUnitOfWork.java:453)
> 	at org.eclipse.persistence.internal.jpa.EntityManagerImpl.flush(EntityManagerImpl.java:863)
> 	at org.eclipse.persistence.internal.jpa.QueryImpl.performPreQueryFlush(QueryImpl.java:963)
> 	at org.eclipse.persistence.internal.jpa.QueryImpl.executeReadQuery(QueryImpl.java:207)
> 	at org.eclipse.persistence.internal.jpa.QueryImpl.getSingleResult(QueryImpl.java:517)
> 	at org.eclipse.persistence.internal.jpa.EJBQueryImpl.getSingleResult(EJBQueryImpl.java:400)
> 	at org.apache.ambari.server.orm.dao.DaoUtils.selectOne(DaoUtils.java:80)
> 	at org.apache.ambari.server.orm.dao.StackDAO.find(StackDAO.java:93)
> 	at org.apache.ambari.server.orm.AmbariLocalSessionInterceptor.invoke(AmbariLocalSessionInterceptor.java:53)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.setStackVersion(ServiceComponentHostImpl.java:1058)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl$ServiceComponentHostOpStartedTransition.transition(ServiceComponentHostImpl.java:628)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl$ServiceComponentHostOpStartedTransition.transition(ServiceComponentHostImpl.java:610)
> 	at org.apache.ambari.server.state.fsm.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:354)
> 	at org.apache.ambari.server.state.fsm.StateMachineFactory.doTransition(StateMachineFactory.java:294)
> 	at org.apache.ambari.server.state.fsm.StateMachineFactory.access$300(StateMachineFactory.java:39)
> 	at org.apache.ambari.server.state.fsm.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:440)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.handleEvent(ServiceComponentHostImpl.java:901)
> 	at org.apache.ambari.server.state.cluster.ClusterImpl.processServiceComponentHostEvents(ClusterImpl.java:2508)
> 	at org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:68)
> 	at org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:343)
> 	at org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:195)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> It's just a JPA entity merge that's doing this. The whole transaction is:
> {code}
> UPDATE hostcomponentstate SET version = ? WHERE ((((component_name = ?) AND (host_id
= ?)) AND (cluster_id = ?)) AND (service_name = ?))
> {code}
> There is a CLUSTERED index on {{component_name}}, {{host_id}}, {{cluster_id}}, and {{service_name}},
so the predicate of the query caused the Clustered Index Seek (that's good since that causes
a single lock). The {{UPDATE}} however then causes the Clustered Index Update (which is a
{{DELETE}} followed by an {{INSERT}}, no?)
> Essentially, we have concurrent {{UPDATE}} statements in separate transactions acting
on different rows of {{hostcomponentstate}}. This seems to cause a deadlock because both processes
have an X lock and then try to acquire a U lock. The U lock is what is making me think they
are trying to acquire the table lock in order to update the cluster index.
> In any event, it seems like the deadlock is caused by the escalation of a lock (whether
it's a page or table); both processes hold row level key locks which are exclusive and then
they both request a U lock which blocks for both.
> Below is the deadlock graph XML:
> {code}
> <?xml version="1.0" encoding="UTF-8"?>
> <deadlock-list>
>   <deadlock victim="process10faf6ca8">
>     <process-list>
>       <process id="process10faf6ca8" taskpriority="0" logused="344" waitresource="KEY:
5:72057594165723136 (fd5c95c6a91a)" waittime="4391" ownerId="16290998" transactionname="implicit_transaction"
lasttranstarted="2015-07-22T00:18:01.547" XDES="0x1177363b0" lockMode="U" schedulerid="2"
kpid="3324" status="suspended" spid="54" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2015-07-22T00:18:01.547"
lastbatchcompleted="2015-07-22T00:18:01.547" lastattention="1900-01-01T00:00:00.547" clientapp="Microsoft
JDBC Driver for SQL Server" hostname="headnode0" hostpid="0" loginname="ambari@ivanmisqlserver.cloudapp.net"
isolationlevel="read committed (2)" xactid="16290998" currentdb="5" lockTimeout="4294967295"
clientoption1="671088672" clientoption2="128058">
>         <executionStack>
>           <frame procname="adhoc" line="1" stmtstart="160" stmtend="450" sqlhandle="0x02000000b5fa61048c97ed65c734e84590264b91bbf93aca0000000000000000000000000000000000000000">unknown</frame>
>           <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">unknown</frame>
>         </executionStack>
>         <inputbuf>(@P0 nvarchar(4000),@P1 nvarchar(4000),@P2 bigint,@P3 bigint,@P4
nvarchar(4000))UPDATE hostcomponentstate SET version = @P0 WHERE ((((component_name = @P1)
AND (host_id = @P2)) AND (cluster_id = @P3)) AND (service_name = @P4))</inputbuf>
>       </process>
>       <process id="process10dfa1c28" taskpriority="0" logused="520" waitresource="KEY:
5:72057594165723136 (930b38dc45e3)" waittime="4384" ownerId="16290959" transactionname="implicit_transaction"
lasttranstarted="2015-07-22T00:18:01.537" XDES="0x1199b43b0" lockMode="U" schedulerid="1"
kpid="3364" status="suspended" spid="61" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2015-07-22T00:18:01.553"
lastbatchcompleted="2015-07-22T00:18:01.553" lastattention="1900-01-01T00:00:00.553" clientapp="Microsoft
JDBC Driver for SQL Server" hostname="headnode0" hostpid="0" loginname="ambari@ivanmisqlserver.cloudapp.net"
isolationlevel="read committed (2)" xactid="16290959" currentdb="5" lockTimeout="4294967295"
clientoption1="671088672" clientoption2="128058">
>         <executionStack>
>           <frame procname="adhoc" line="1" stmtstart="160" stmtend="462" sqlhandle="0x02000000e94be81b7f4521ee46bc4614740d406162aad19d0000000000000000000000000000000000000000">unknown</frame>
>           <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">unknown</frame>
>         </executionStack>
>         <inputbuf>(@P0 nvarchar(4000),@P1 nvarchar(4000),@P2 bigint,@P3 bigint,@P4
nvarchar(4000))UPDATE hostcomponentstate SET current_state = @P0 WHERE ((((component_name
= @P1) AND (host_id = @P2)) AND (cluster_id = @P3)) AND (service_name = @P4))</inputbuf>
>       </process>
>     </process-list>
>     <resource-list>
>       <keylock hobtid="72057594165723136" dbid="5" objectname="ambari.dbo.hostcomponentstate"
indexname="PK__hostcomp__C72E1492ED078925" id="lock113b1de80" mode="X" associatedObjectId="72057594165723136">
>         <owner-list>
>           <owner id="process10dfa1c28" mode="X" />
>         </owner-list>
>         <waiter-list>
>           <waiter id="process10faf6ca8" mode="U" requestType="wait" />
>         </waiter-list>
>       </keylock>
>       <keylock hobtid="72057594165723136" dbid="5" objectname="ambari.dbo.hostcomponentstate"
indexname="PK__hostcomp__C72E1492ED078925" id="lock110cf0b00" mode="X" associatedObjectId="72057594165723136">
>         <owner-list>
>           <owner id="process10faf6ca8" mode="X" />
>         </owner-list>
>         <waiter-list>
>           <waiter id="process10dfa1c28" mode="U" requestType="wait" />
>         </waiter-list>
>       </keylock>
>     </resource-list>
>   </deadlock>
> </deadlock-list>
> {code}
> Both processes have an X lock on clustered index {{PK__hostcomp__C72E1492ED078925}} and
then both try to obtain a U lock; this is because they are doing a Clustered Index Update
and are requesting an escalation to a table lock. This is where the deadlock occurs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message