activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Emmons (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMQ-2730) Transport failed: java.io.EOFException
Date Fri, 08 Apr 2011 18:14:06 GMT

    [ https://issues.apache.org/jira/browse/AMQ-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017577#comment-13017577
] 

Scott Emmons commented on AMQ-2730:
-----------------------------------

Related to the ORIGINAL description of this bug, here's our experience in debugging EOFException
errors with AMQ 5.4.2 on linux, using a failover connector:

We just spent some time debugging a similar EOFException situation and believe the cause is
memory related on the ActiveMQ server, even though the server may not ever throw an OutOfMemory
exception. In our case, this cause was not the least bit obvious as the server would appear
to function fine, but at some point the inactivity timer would kick in and cause reconnects.
When this would happen, new connections (from our Java clients) to the server would be immediately
closed and the client would start hammering the server up until the max reconnect attempts
is hit (even though the documentation says the default is unlimited, it's actually 1000 in
TaskRunnerFactory - due to DefaultThreadPools having what we think is the right code commented
out, unless that means something different). Because of synchronization in this code, a single
thread will hit this loop, while all the other threads are stuck waiting on the mutex (which
is fine, because nobody can do any work anyway). Once the 1000 connection attempts are done,
the client JVM becomes effectively useless for any JMS activity, as the thread in the reconnection
loop will give up and end up waiting forever on an event that will never occur. All other
threads are stuck waiting on this mutex.

We have seen many cases where restarting the client appears fine (it reconnects to ActiveMQ
server just fine), giving a false sense that all is well. At some point, the client will once
again get into this state. Because of this, we 

While there is little the client code can do in this situation, there are things that can
be done to help with this situation. Here's my suggestions, if it helps:

1. In the TaskRunnerFactory, if we hit the max reconnect attempts, a high severity log entry
should be written saying that AMQ client is effectively dead and giving up. This message should
repeat at some reasonable interval so it doesn't get buried in logs.

2. Improve AMQ client logging when it is this reconnect loop. We found that we had to enable
TRACE level logging to really get enough detail to see what was going on.

3. Improve AMQ server logging, if possible, to indicate why it is closing sockets from the
client.

4. Validate if the current code in DefaultThreadPools is indeed correct (not the commented
out code).

5. I don't know what the default memoryUsage limit is in the example activemq.xml, but make
sure that it's a sane value considering the default max heap is 512MB.


> Transport failed: java.io.EOFException
> --------------------------------------
>
>                 Key: AMQ-2730
>                 URL: https://issues.apache.org/jira/browse/AMQ-2730
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.3.1
>         Environment: Red hat enterprise
>            Reporter: Dennis Klinkott
>             Fix For: 5.6.0
>
>         Attachments: TCP Dump.jpg
>
>
> We see many Exceptions of this type in the broker logs: 
> 2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport
| ActiveMQ Transport: tcp:///10.231.233.55:50809
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>         at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:211)
>         at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203)
>         at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:186)
>         at java.lang.Thread.run(Thread.java:619)
> This exception occurs with a lot of clients that run on different machines.
> One case where we almost always see this exception is when shutting down a bridge. We
did a tcp dump on the broker and on the client machine: there a no packets lost.
> LOG ON BRIDGE:
> 2010-05-12 14:10:02,634 INFO  org.apache.activemq.broker.BrokerService - ActiveMQ Message
Broker (solr-2073bridge, ID:solr-2073-50804-
> 1273665033782-0:0) is shutting down
> 2010-05-12 14:10:02,799 INFO  org.apache.activemq.network.jms.JmsConnector - JMS Connector
Connector:0 Stopped
> 2010-05-12 14:10:02,819 INFO  org.apache.activemq.broker.BrokerService - ActiveMQ JMS
Message Broker (solr-2073bridge, ID:solr-2073-50
> 804-1273665033782-0:0) stopped
> LOG ON BROKER:
> 2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport
| ActiveMQ Transport: tcp:///10.231.233.55:50809
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>         at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:211)
>         at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203)
>         at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:186)
>         at java.lang.Thread.run(Thread.java:619)
> 2010-05-12 14:10:03,492 | DEBUG | Stopping connection: /10.231.233.55:50809 | org.apache.activemq.broker.TransportConnection
| ActiveMQ Task
> 2010-05-12 14:10:03,492 | DEBUG | Stopping transport tcp:///10.231.233.55:50809 | org.apache.activemq.transport.tcp.TcpTransport
| ActiveMQ Task
> 2010-05-12 14:10:03,493 | DEBUG | Stopped transport: /10.231.233.55:50809 | org.apache.activemq.broker.TransportConnection
| ActiveMQ Task
> 2010-05-12 14:10:03,493 | DEBUG | Cleaning up connection resources: /10.231.233.55:50809
| org.apache.activemq.broker.TransportConnection | ActiveMQ Task
> 2010-05-12 14:10:03,493 | DEBUG | remove connection id: ID:solr-2073-50804-1273665033782-2:0
| org.apache.activemq.broker.TransportConnection | ActiveMQ Task
> 2010-05-12 14:10:03,493 | DEBUG | jms1 removing consumer: ID:solr-2073-50804-1273665033782-2:0:225:1
for destination: temp-queue://ID:solr-2073-50804-1273665033782-2:0:105 | org.apache.activemq.broker.region.AbstractRegion
| ActiveMQ Task
> 2010-05-12 14:10:03,493 | DEBUG | remove sub: QueueSubscription: consumer=ID:solr-2073-50804-1273665033782-2:0:225:1,
destinations=1, dispatched=0, delivered=0, pending=0, lastDeliveredSeqId: 0, dequeues: 0,
dispatched: 0, inflight: 0 | org.apache.activemq.broker.region.Queue | ActiveMQ Task

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message