activemq-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Wramner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMQ-5961) Deadlock in client blocks all application server threads
Date Wed, 23 Sep 2015 06:09:04 GMT

    [ https://issues.apache.org/jira/browse/AMQ-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904020#comment-14904020
] 

Erik Wramner commented on AMQ-5961:
-----------------------------------

Pull request 143 (https://github.com/apache/activemq/pull/143) is open, 141 is closed.

As for all threads becoming stuck I don't agree for two reasons. First it might be that the
thread that is supposed to reestablish communication is blocked on the locked HashMap. I don't
know the code base well enough to easily find out, but I can dig into the stack trace. If
that is the case we have a true deadlock and it would help to get rid of the global lock.
Second even if afterRollback blocks forever that will kill one thread and one transaction,
but without the global lock it may be possible for the other 169 threads to proceed with their
work for new transactions (possibly for other queues/topics if the queue/topic is broken).
They may also get blocked by other things, but it is possible that they can proceed. That
could make the difference between a panic and a planned restart after peak times.

By the way, I changed the code again so that it doesn't lock the list when it has been removed
from the map. Instead all additions to the list are within the map's lock. That way additions
are protected and nothing can modify the list when it has been removed, so it can be processed
safely without locks. That gets rid of the possible global lock in isInXATransaction, where
the previous version locked the map and then tried to get a lock on the list.

> Deadlock in client blocks all application server threads
> --------------------------------------------------------
>
>                 Key: AMQ-5961
>                 URL: https://issues.apache.org/jira/browse/AMQ-5961
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: JMS client, RAR
>    Affects Versions: 5.11.0, 5.11.1
>         Environment: Platform independent
>            Reporter: Erik Wramner
>
> When we run JBoss EAP 6.4 with high load using AMQ 6.2 (ActiveMQ 5.11) everything grinds
to a halt after a few hours. We have 170 threads blocked on the same lock:
> "default-threads - 1400" #406720 prio=5 os_prio=0 tid=0x00007f1b8402b800 nid=0xfe2d waiting
for monitor entry [0x00007f19ccdd5000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at org.apache.activemq.TransactionContext.setXid(TransactionContext.java:729)
> 	- waiting to lock <0x000000063ac688b8> (a java.util.HashMap)
> 	at org.apache.activemq.TransactionContext.end(TransactionContext.java:418)
> 	at org.apache.activemq.ra.LocalAndXATransaction.end(LocalAndXATransaction.java:98)
> 	at org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.end(XAResourceWrapperImpl.java:118)
> 	at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelPrepare(XAResourceRecord.java:208)
> 	at com.arjuna.ats.arjuna.coordinator.BasicAction.doPrepare(BasicAction.java:2530)
> 	at com.arjuna.ats.arjuna.coordinator.BasicAction.doPrepare(BasicAction.java:2497)
> 	at com.arjuna.ats.arjuna.coordinator.BasicAction.prepare(BasicAction.java:2074)
> 	- locked <0x000000067461c090> (a com.arjuna.ats.internal.jta.transaction.arjunacore.AtomicAction)
> 	at com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1485)
> 	- locked <0x000000067461c090> (a com.arjuna.ats.internal.jta.transaction.arjunacore.AtomicAction)
> 	at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:98)
> 	at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)
> This lock is owned by a thread that is waiting for another lock:
> "default-threads - 1381" #404073 prio=5 os_prio=0 tid=0x00007f1a6403c000 nid=0x7d49 waiting
on condition [0x00007f19d8a91000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x000000063ab2a260> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> 	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> 	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> 	at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:66)
> 	at org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
> 	at org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1309)
> 	at org.apache.activemq.ActiveMQConnection.asyncSendPacket(ActiveMQConnection.java:1303)
> 	at org.apache.activemq.ActiveMQSession.asyncSendPacket(ActiveMQSession.java:2008)
> 	at org.apache.activemq.ActiveMQSession$5.afterRollback(ActiveMQSession.java:973)
> 	at org.apache.activemq.TransactionContext.afterRollback(TransactionContext.java:162)
> 	at org.apache.activemq.TransactionContext.prepare(TransactionContext.java:482)
> 	- locked <0x000000063ac688b8> (a java.util.HashMap)
> 	at org.apache.activemq.ra.LocalAndXATransaction.prepare(LocalAndXATransaction.java:130)
> 	at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelPrepare(XAResourceRecord.java:213)
> 	at com.arjuna.ats.arjuna.coordinator.BasicAction.doPrepare(BasicAction.java:2530)
> 	at com.arjuna.ats.arjuna.coordinator.BasicAction.doPrepare(BasicAction.java:2497)
> The locked object is ENDED_XA_TRANSACTION_CONTEXTS in TransactionContext. The thread
that has the lock calls afterRollback without first releasing it and afterRollback does not
return promptly (not at all in fact, we need to restart the server to fix this). The same
can happen with afterCommit, which is also called from within a synchronized block causing
a potential deadlock.
> I think I have a fix for this and will create a pull request in github, or if you prefer
I can send a patch? It simply means moving the external calls out from the synchronized blocks
and would both remove the very real deadlock risks and improve performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message