activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Green <james.mk.gr...@gmail.com>
Subject Re: Network bridge is stuck, repeatedly requires broker restart
Date Thu, 01 Nov 2012 14:57:02 GMT
A follow-up. Our hub and spoke were restarted. The connection dropped
within a couple of hours.

This time, subsequently, we are having trouble with Stomp transport
threads. Check this out:

Name: ActiveMQ Transport Server Thread Handler:
stomp://localhost:61612?trace=true
State: TIMED_WAITING on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@104273f
Total blocked: 0  Total waited: 6,613

Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:423)
org.apache.activemq.transport.tcp.TcpTransportServer$1.run(TcpTransportServer.java:351)
java.lang.Thread.run(Thread.java:679)

I am assuming the daemon thread that listens for stomp connections on port
61612 is the above, and it's waiting on... something?

Having to restart again! What's the best way forward here, anyone?

Thanks again,

James

On 1 November 2012 12:18, James Green <james.mk.green@gmail.com> wrote:

> Gary,
>
> Spokes reach out to the hub. The hub is in an office with a busy ADSL line
> attached. Connection drops are to be expected. We have observed some spokes
> drop and reconnect absolutely fine. Obviously in more recent times
> sometimes this connections are becoming hung. We wondered if this might be
> because of increased traffic load over the line as the network has been
> reliable for many months previously.
>
> The past week or more hanging drops have been occuring every few hours. We
> filtered the traffic to see if this reduced or eliminated the problem - so
> far one drop being experienced now in 24 hours.
>
> The thread remains blocked at the moment. We have evidence to suggest the
> connection to this spoke ("quarrel") dropped approximately two hours ago.
>
> Checking out the OS layer:
>
> jamesg@hub:~$ sudo netstat -natp | grep x.x.x.x
> tcp6  103343  78476 hub:61617         x.x.x.x:44493    ESTABLISHED
> 20345/java
>
> Spoke:
> jamesg@quarrel:~$ sudo netstat -natp | grep 61617
> tcp6  113111 146448 x.x.x.x:44493    hub:61617      ESTABLISHED 27968/java
>
> Clearly something is in fact present. These two are both Ubuntu boxes (all
> ours are). Should we be tweaking something at the OS level, or perhaps
> switching from ssl:// to ssl+nio://?
>
> Floating ideas now!
>
> Thanks,
>
> James
>
>
> On 1 November 2012 11:55, Gary Tully <gary.tully@gmail.com> wrote:
>
>> actually soWriteTimeout does not help with ssl, because the ssl stream
>> has an exclusive lock that blocks any call to close.
>>
>> you will have to deal with the blocked write at the OS level via the
>> tcp stack timeout configuration.
>>
>> But the real root cause is understanding why the connection drops in
>> the first place.
>>
>> Do you see the thread blocked on a write eventually timing out when
>> the OS detects the half closed socket? It should do once a remote
>> close has been initiated.
>>
>> Or is the problem that the hub/spoke does not really close the
>> connection when it sees a problem?
>>
>> It is odd that it is a 24hr thing,  what happens at that time.
>>
>> On 1 November 2012 11:27, James Green <james.mk.green@gmail.com> wrote:
>> > Having implemented queue and destination filtering between spokes and
>> the
>> > hub, the connections lasted just over 24 hours.
>> >
>> > It's just died. Same traces on the hub as before. On the spoke with
>> which
>> > comms have died:
>> >
>> > Name: ActiveMQ Transport: ssl://hub/x.x.x.x:61617
>> > State: WAITING on
>> > java.util.concurrent.locks.ReentrantLock$NonfairSync@4864ae35 owned by:
>> > ActiveMQ BrokerService[zorin] Task-3350
>> > Total blocked: 68  Total waited: 1,862
>> >
>> > Stack trace:
>> > sun.misc.Unsafe.park(Native Method)
>> > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871)
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201)
>> >
>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
>> > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
>> >
>> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:66)
>> >
>> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
>> >
>> org.apache.activemq.network.DemandForwardingBridgeSupport.serviceRemoteCommand(DemandForwardingBridgeSupport.java:467)
>> >
>> org.apache.activemq.network.DemandForwardingBridgeSupport$2.onCommand(DemandForwardingBridgeSupport.java:179)
>> >
>> org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
>> >
>> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
>> >
>> org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
>> >
>> org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:256)
>> >
>> org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
>> >
>> org.apache.activemq.transport.tcp.SslTransport.doConsume(SslTransport.java:91)
>> >
>> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:222)
>> >
>> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:204)
>> > java.lang.Thread.run(Thread.java:636)
>> >
>> > Name: ActiveMQ BrokerService[zorin] Task-3350
>> > State: RUNNABLE
>> > Total blocked: 0  Total waited: 668
>> >
>> > Stack trace:
>> > java.net.SocketOutputStream.socketWrite0(Native Method)
>> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>> > java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>> > sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:314)
>> > sun.security.ssl.OutputRecord.write(OutputRecord.java:303)
>> >
>> sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:763)
>> > sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:751)
>> > sun.security.ssl.AppOutputStream.write(AppOutputStream.java:78)
>> >    - locked sun.security.ssl.AppOutputStream@3109e59b
>> >
>> org.apache.activemq.transport.tcp.TcpBufferedOutputStream.write(TcpBufferedOutputStream.java:96)
>> > java.io.DataOutputStream.write(DataOutputStream.java:107)
>> >    - locked java.io.DataOutputStream@c13c44
>> >
>> org.apache.activemq.openwire.v9.BaseDataStreamMarshaller.tightMarshalByteSequence2(BaseDataStreamMarshaller.java:431)
>> >
>> org.apache.activemq.openwire.v9.MessageMarshaller.tightMarshal2(MessageMarshaller.java:180)
>> >
>> org.apache.activemq.openwire.v9.ActiveMQMessageMarshaller.tightMarshal2(ActiveMQMessageMarshaller.java:89)
>> >
>> org.apache.activemq.openwire.v9.ActiveMQTextMessageMarshaller.tightMarshal2(ActiveMQTextMessageMarshaller.java:89)
>> >
>> org.apache.activemq.openwire.OpenWireFormat.marshal(OpenWireFormat.java:243)
>> >    - locked org.apache.activemq.openwire.OpenWireFormat@e5b94bf
>> >
>> org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:183)
>> >
>> org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:289)
>> >
>> org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:271)
>> >
>> org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85)
>> >
>> org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104)
>> >
>> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
>> >
>> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
>> >
>> org.apache.activemq.network.DemandForwardingBridgeSupport.serviceLocalCommand(DemandForwardingBridgeSupport.java:726)
>> >
>> org.apache.activemq.network.DemandForwardingBridgeSupport$1.onCommand(DemandForwardingBridgeSupport.java:167)
>> >
>> org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
>> >
>> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
>> >
>> org.apache.activemq.transport.vm.VMTransport.doDispatch(VMTransport.java:138)
>> >
>> org.apache.activemq.transport.vm.VMTransport.dispatch(VMTransport.java:127)
>> >    - locked java.util.concurrent.atomic.AtomicBoolean@1e828ef6
>> >
>> org.apache.activemq.transport.vm.VMTransport.oneway(VMTransport.java:104)
>> >
>> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
>> >
>> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
>> >
>> org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1312)
>> >
>> org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:838)
>> >
>> org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:873)
>> >
>> org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
>> >
>> org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> > java.lang.Thread.run(Thread.java:636)
>> >
>> > Is this something that should be reported in JIRA? And should we be
>> > changing the writeTimeout you speak of? If so, to what?
>> >
>> > Thanks,
>> >
>> > James
>> >
>> > On 31 October 2012 16:57, Gary Tully <gary.tully@gmail.com> wrote:
>> >
>> >> from a quick look at one of those stack traces, it looks like a thread
>> >> is blocked on a socket write, it may be that the socket is half closed
>> >> and that the write needs to timeout at the tcp level. that can take
>> >> some time.
>> >>
>> >> There is a writeTimeout filter that can help at the activemq level.
>> >>
>> >> Have a peek at https://issues.apache.org/jira/browse/AMQ-1993
>> >>
>> >>
>> >> On 30 October 2012 21:31, James Green <james.mk.green@gmail.com>
>> wrote:
>> >> > This is happening repeatedly causing memory consumption to balloon
>> >> locally
>> >> > and message queues to build remotely.
>> >> >
>> >> > Googling around isn't revealing much. Can anyone take a look and
>> tell me
>> >> if
>> >> > this stuff is normal or not? Need to narrow down the investigations.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > James
>> >> >
>> >> > On 30 October 2012 11:32, James Green <james.mk.green@gmail.com>
>> wrote:
>> >> >
>> >> >> Hub with three spokes. The network between the hub and one of these
>> >> spokes
>> >> >> seems to be dying irregularly yet frequently.
>> >> >>
>> >> >> I managed to get a thread dump of the hub's ssl connector to this
>> spoke
>> >> >> along with the thread it was waiting on:
>> >> >>
>> >> >> http://pastebin.com/ZjeSxDK7
>> >> >>
>> >> >> I also captched a full thread dump on the hub:
>> >> >>
>> >> >> http://pastebin.com/Qu113JhQ
>> >> >>
>> >> >> On the spoke the thread dump button was disabled but I captured
the
>> >> >> equivalent individual threads:
>> >> >>
>> >> >> http://pastebin.com/1hwb1BDp
>> >> >>
>> >> >> FWIW we just restarted the spoke and the network connector appears
>> to be
>> >> >> stuck once again. Not sure what we can actually do now...
>> >> >>
>> >> >> Can anyone offer advice, perhaps a JIRA issue? This is with 5.7.0.
>> >> >>
>> >> >> James
>> >> >>
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> http://redhat.com
>> >> http://blog.garytully.com
>> >>
>>
>>
>>
>> --
>> http://redhat.com
>> http://blog.garytully.com
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message