activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tamas Cserveny (JIRA)" <>
Subject [jira] [Commented] (AMQ-2798) Occaional hangs on ensureConnectionInfoSent
Date Sat, 31 Jan 2015 20:04:35 GMT


Tamas Cserveny commented on AMQ-2798:

Actially there is a general problem behind the scenes:

The component ResponseCorrelator is written in an optimistic manner. The assumption is that
in case a command sent to the broker, an answer will reach us guaranteed.

This is not true in case of failover situations. It might happen, that we were able to serialize
the last bit of the command just before the broker has crashed/killed. This is true for the
failover-transport as well.

This means ResponseCorrelator could hang forever at very different locations.

The solution would be to define a time-to-live for the commands. Some of the commands do have
timeout, the calling code also takes care of timeouts. (Callers without timeout does not cope
with the situation well. In my case I interrupted the above thread, and then JBoss started
to process the messages using two different processors at the same time). Thus, ResponseCorrelator
should repeat the command to the server using the same commandID(?). 
In case the command ID is known to the failover-transport (ResponseMap) it might ignore the
resend, because most likely it is still attempting to send it. TTL could be a larger number
like 30-40 sec with infinite as default.

> Occaional hangs on ensureConnectionInfoSent
> -------------------------------------------
>                 Key: AMQ-2798
>                 URL:
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: JMS client
>    Affects Versions: 5.3.2
>            Reporter: Mark Chaimungkalanont
>            Assignee: Timothy Bish
>             Fix For: 5.5.0
>         Attachments: blocked-connection-patch3
> When connecting to the broker, the client occasionally starts to hang. A thread dump
> {noformat}
> "QuartzScheduler_Worker-7" prio=5 tid=0x0116f190 nid=0x1ce2400 waiting on condition [0xf1fae000..0xf1fafb30]
> 	at sun.misc.Unsafe.park(Native Method)
> 	at java.util.concurrent.locks.LockSupport.park(
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(
> 	at java.util.concurrent.ArrayBlockingQueue.take(
> 	at org.apache.activemq.transport.FutureResponse.getResult(
> 	at org.apache.activemq.transport.ResponseCorrelator.request(
> 	at org.apache.activemq.ActiveMQConnection.syncSendPacket(
> 	at org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent(
> 	- locked <0x10b9bdf8> (a java.lang.Object)
> 	at org.apache.activemq.ActiveMQConnection.createSession(
> 	at org.jencks.amqpool.SessionPool.createSession(
> 	at org.jencks.amqpool.SessionPool.makeObject(
> 	at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(
> 	at org.jencks.amqpool.SessionPool.borrowSession(
> 	at org.jencks.amqpool.ConnectionPool.createSession(
> 	at org.jencks.amqpool.XaConnectionPool.createSession(
> 	at org.jencks.amqpool.PooledConnection.createSession(
> 	at
> {noformat}
> Looking closer at the code of {{ensureConnectionInfoSent}} in {{ActiveMQConnection}},
it uses the method:
> {code}
> public Response syncSendPacket(Command command) throws JMSException {
> {code}
> which never times out, possibly causing everything to hang eternally. There does seem
to be an identical method that allows for a timeout. 
> {code}
>     public Response syncSendPacket(Command command, int timeout) throws JMSException
> {code}
> should / can ensureConnectionInfoSent use the one with the timeout instead?
> We're using the failover transport:
> failover:(tcp://<someIP>:54663?wireFormat.maxInactivityDuration=300000)?maxReconnectAttempts=10&amp;initialReconnectDelay=15000

This message was sent by Atlassian JIRA

View raw message