ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-17738) EclipseLink Sequence Query Stuck Inside of Transaction And Blocks Other Threads
Date Sat, 16 Jul 2016 00:20:20 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hurley updated AMBARI-17738:
-------------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> EclipseLink Sequence Query Stuck Inside of Transaction And Blocks Other Threads
> -------------------------------------------------------------------------------
>
>                 Key: AMBARI-17738
>                 URL: https://issues.apache.org/jira/browse/AMBARI-17738
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.4.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Blocker
>             Fix For: 2.4.0
>
>         Attachments: AMBARI-17738.patch
>
>
> Reproduced as part of creating a rolling upgrade on a large cluster:
> - Cluster had 30 hosts with about 40 components per host (full stack)
> - Begin a rolling upgrade; a lot of database activity is seen for about 1-2 minutes,
and then the deadlock occurs.
> Initially appearing as a deadlock, it's caused by Postgres is holding the socket open
indefinitely. We have a write lock being held while the socket is open. Jstack dumps taken
many minutes apart show the same thread is stuck in a socket read:
> {{ambari-heartbeat-processor-0}} acquires a write lock on {{ServiceComponentHostImpl}}
at {{setState}} and is stuck reading.
> {code}
> "ambari-heartbeat-processor-0" #16 prio=5 os_prio=0 tid=0x00007fcc1080b000 nid=0x7e03
runnable [0x00007fcc163ce000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.net.SocketInputStream.socketRead0(Native Method)
> 	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> 	at java.net.SocketInputStream.read(SocketInputStream.java:170)
> 	at java.net.SocketInputStream.read(SocketInputStream.java:141)
> 	at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
> 	at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
> 	at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:71)
> 	at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:269)
> 	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1704)
> 	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
> 	- locked <0x00000000aad71388> (a org.postgresql.core.v3.QueryExecutorImpl)
> 	at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:559)
> 	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417)
> 	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:363)
> 	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:892)
> 	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeNoSelect(DatabaseAccessor.java:964)
> 	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:633)
> 	at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatch(ParameterizedSQLBatchWritingMechanism.java:149)
> 	at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatchedStatements(ParameterizedSQLBatchWritingMechanism.java:134)
> 	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.writesCompleted(DatabaseAccessor.java:1845)
> 	at org.eclipse.persistence.internal.sessions.AbstractSession.writesCompleted(AbstractSession.java:4300)
> 	at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.writesCompleted(UnitOfWorkImpl.java:5592)
> 	at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.acquireWriteLocks(UnitOfWorkImpl.java:1646)
> 	at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitTransactionAfterWriteChanges(UnitOfWorkImpl.java:1614)
> 	at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.commitRootUnitOfWork(RepeatableWriteUnitOfWork.java:285)
> 	at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitAndResume(UnitOfWorkImpl.java:1169)
> 	at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commit(EntityTransactionImpl.java:134)
> 	at org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:153)
> 	at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
> 	at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl$$EnhancerByGuice$$e9f3a32.saveComponentStateEntityIfPersisted(<generated>)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.setState(ServiceComponentHostImpl.java:850)
> 	at org.apache.ambari.server.agent.HeartbeatProcessor.processStatusReports(HeartbeatProcessor.java:610)
> 	at org.apache.ambari.server.agent.HeartbeatProcessor.processHeartbeat(HeartbeatProcessor.java:212)
> 	at org.apache.ambari.server.agent.HeartbeatProcessor$HeartbeatProcessingTask.run(HeartbeatProcessor.java:188)
> {code}
> Many other threads (for example {{ambari-client-thread-574}}) acquire a cluster read
lock. Then, they are held waiting for {{ambari-heartbeat-processor-0}} since they need a read
lock on {{ServiceComponentHostImpl}}
> {code}
> "ambari-client-thread-574" #574 prio=5 os_prio=0 tid=0x0000000003ef5000 nid=0x5cd0 waiting
on condition [0x00007fcc073f4000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000a30c67f0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.convertToResponse(ServiceComponentHostImpl.java:1355)
> 	at org.apache.ambari.server.controller.AmbariManagementControllerImpl.getHostComponents(AmbariManagementControllerImpl.java:1299)
> 	at org.apache.ambari.server.controller.AmbariManagementControllerImpl.getHostComponents(AmbariManagementControllerImpl.java:3365)
> 	at org.apache.ambari.server.controller.internal.HostComponentResourceProvider$2.invoke(HostComponentResourceProvider.java:215)
> 	at org.apache.ambari.server.controller.internal.HostComponentResourceProvider$2.invoke(HostComponentResourceProvider.java:212)
> 	at org.apache.ambari.server.controller.internal.AbstractResourceProvider.getResources(AbstractResourceProvider.java:307)
> 	at org.apache.ambari.server.controller.internal.HostComponentResourceProvider.findResources(HostComponentResourceProvider.java:212)
> 	at org.apache.ambari.server.controller.internal.HostComponentResourceProvider.getResources(HostComponentResourceProvider.java:188)
> 	at org.apache.ambari.server.controller.internal.ClusterControllerImpl$ExtendedResourceProviderWrapper.queryForResources(ClusterControllerImpl.java:966)
> 	at org.apache.ambari.server.controller.internal.ClusterControllerImpl.getResources(ClusterControllerImpl.java:141)
> {code}
> Finally, a writer thread tries to start the upgrade and has to wait to get a write lock
on the cluster:
> {code}
> "ambari-client-thread-507" #507 prio=5 os_prio=0 tid=0x0000000001db7000 nid=0x1e1a waiting
on condition [0x00007fcc0d6cf000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000a2564b80> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> 	at org.apache.ambari.server.state.cluster.ClusterImpl.setUpgradeEntity(ClusterImpl.java:3641)
> 	at org.apache.ambari.server.state.cluster.ClusterImpl$$EnhancerByGuice$$770b7b6b.CGLIB$setUpgradeEntity$35(<generated>)
> 	at org.apache.ambari.server.state.cluster.ClusterImpl$$EnhancerByGuice$$770b7b6b$$FastClassByGuice$$ef7ae837.invoke(<generated>)
> 	at com.google.inject.internal.cglib.proxy.$MethodProxy.invokeSuper(MethodProxy.java:228)
> 	at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
> 	at org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:118)
> 	at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
> 	at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)
> 	at org.apache.ambari.server.state.cluster.ClusterImpl$$EnhancerByGuice$$770b7b6b.setUpgradeEntity(<generated>)
> 	at org.apache.ambari.server.controller.internal.UpgradeResourceProvider.createUpgrade(UpgradeResourceProvider.java:987)
> 	at org.apache.ambari.server.controller.internal.UpgradeResourceProvider$$EnhancerByGuice$$be5dc6f2.CGLIB$createUpgrade$4(<generated>)
> 	at org.apache.ambari.server.controller.internal.UpgradeResourceProvider$$EnhancerByGuice$$be5dc6f2$$FastClassByGuice$$83a8554c.invoke(<generated>)
> {code}
> So, the problem is clearly that {{ambari-heartbeat-processor-0}} is stuck reading. 
> Investigating on Postgres shows that there is a lock blocking the thread which is waiting:
> ||blocked_pid||blocked_user||blocking_pid||blocking_user||blocked_statement||current_statement_in_blocking_process||
> |31705|ambaricustomuser|31660|ambaricustomuser|UPDATE hostcomponentstate SET current_state
= $1 WHERE (id = $2)|SELECT sequence_value FROM ambari_sequences WHERE sequence_name = $1|
> The sequence query is currently stuck in the {{idle in transaction}} state which is why
it's blocking the other query. The transaction isn't being ended by EclipseLink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message