Return-Path: Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: (qmail 89742 invoked from network); 8 Apr 2011 18:14:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Apr 2011 18:14:47 -0000 Received: (qmail 31395 invoked by uid 500); 8 Apr 2011 18:14:47 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 31356 invoked by uid 500); 8 Apr 2011 18:14:47 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 31348 invoked by uid 99); 8 Apr 2011 18:14:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 18:14:47 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 18:14:44 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 0353D9871F for ; Fri, 8 Apr 2011 18:14:06 +0000 (UTC) Date: Fri, 8 Apr 2011 18:14:06 +0000 (UTC) From: "Scott Emmons (JIRA)" To: dev@activemq.apache.org Message-ID: <1721228550.44550.1302286446010.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (AMQ-2730) Transport failed: java.io.EOFException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/AMQ-2730?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D130175= 77#comment-13017577 ]=20 Scott Emmons commented on AMQ-2730: ----------------------------------- Related to the ORIGINAL description of this bug, here's our experience in d= ebugging EOFException errors with AMQ 5.4.2 on linux, using a failover conn= ector: We just spent some time debugging a similar EOFException situation and beli= eve the cause is memory related on the ActiveMQ server, even though the ser= ver may not ever throw an OutOfMemory exception. In our case, this cause wa= s not the least bit obvious as the server would appear to function fine, bu= t at some point the inactivity timer would kick in and cause reconnects. Wh= en this would happen, new connections (from our Java clients) to the server= would be immediately closed and the client would start hammering the serve= r up until the max reconnect attempts is hit (even though the documentation= says the default is unlimited, it's actually 1000 in TaskRunnerFactory - d= ue to DefaultThreadPools having what we think is the right code commented o= ut, unless that means something different). Because of synchronization in t= his code, a single thread will hit this loop, while all the other threads a= re stuck waiting on the mutex (which is fine, because nobody can do any wor= k anyway). Once the 1000 connection attempts are done, the client JVM becom= es effectively useless for any JMS activity, as the thread in the reconnect= ion loop will give up and end up waiting forever on an event that will neve= r occur. All other threads are stuck waiting on this mutex. We have seen many cases where restarting the client appears fine (it reconn= ects to ActiveMQ server just fine), giving a false sense that all is well. = At some point, the client will once again get into this state. Because of t= his, we=20 While there is little the client code can do in this situation, there are t= hings that can be done to help with this situation. Here's my suggestions, = if it helps: 1. In the TaskRunnerFactory, if we hit the max reconnect attempts, a high s= everity log entry should be written saying that AMQ client is effectively d= ead and giving up. This message should repeat at some reasonable interval s= o it doesn't get buried in logs. 2. Improve AMQ client logging when it is this reconnect loop. We found that= we had to enable TRACE level logging to really get enough detail to see wh= at was going on. 3. Improve AMQ server logging, if possible, to indicate why it is closing s= ockets from the client. 4. Validate if the current code in DefaultThreadPools is indeed correct (no= t the commented out code). 5. I don't know what the default memoryUsage limit is in the example active= mq.xml, but make sure that it's a sane value considering the default max he= ap is 512MB. > Transport failed: java.io.EOFException > -------------------------------------- > > Key: AMQ-2730 > URL: https://issues.apache.org/jira/browse/AMQ-2730 > Project: ActiveMQ > Issue Type: Bug > Affects Versions: 5.3.1 > Environment: Red hat enterprise > Reporter: Dennis Klinkott > Fix For: 5.6.0 > > Attachments: TCP Dump.jpg > > > We see many Exceptions of this type in the broker logs:=20 > 2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException = | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Trans= port: tcp:///10.231.233.55:50809 > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWire= Format.java:269) > at org.apache.activemq.transport.tcp.TcpTransport.readCommand(Tcp= Transport.java:211) > at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransp= ort.java:203) > at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTranspor= t.java:186) > at java.lang.Thread.run(Thread.java:619) > This exception occurs with a lot of clients that run on different machine= s. > One case where we almost always see this exception is when shutting down = a bridge. We did a tcp dump on the broker and on the client machine: there = a no packets lost. > LOG ON BRIDGE: > 2010-05-12 14:10:02,634 INFO org.apache.activemq.broker.BrokerService - = ActiveMQ Message Broker (solr-2073bridge, ID:solr-2073-50804- > 1273665033782-0:0) is shutting down > 2010-05-12 14:10:02,799 INFO org.apache.activemq.network.jms.JmsConnecto= r - JMS Connector Connector:0 Stopped > 2010-05-12 14:10:02,819 INFO org.apache.activemq.broker.BrokerService - = ActiveMQ JMS Message Broker (solr-2073bridge, ID:solr-2073-50 > 804-1273665033782-0:0) stopped > LOG ON BROKER: > 2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException = | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Trans= port: tcp:///10.231.233.55:50809 > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWire= Format.java:269) > at org.apache.activemq.transport.tcp.TcpTransport.readCommand(Tcp= Transport.java:211) > at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransp= ort.java:203) > at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTranspor= t.java:186) > at java.lang.Thread.run(Thread.java:619) > 2010-05-12 14:10:03,492 | DEBUG | Stopping connection: /10.231.233.55:508= 09 | org.apache.activemq.broker.TransportConnection | ActiveMQ Task > 2010-05-12 14:10:03,492 | DEBUG | Stopping transport tcp:///10.231.233.55= :50809 | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ Task > 2010-05-12 14:10:03,493 | DEBUG | Stopped transport: /10.231.233.55:50809= | org.apache.activemq.broker.TransportConnection | ActiveMQ Task > 2010-05-12 14:10:03,493 | DEBUG | Cleaning up connection resources: /10.2= 31.233.55:50809 | org.apache.activemq.broker.TransportConnection | ActiveMQ= Task > 2010-05-12 14:10:03,493 | DEBUG | remove connection id: ID:solr-2073-5080= 4-1273665033782-2:0 | org.apache.activemq.broker.TransportConnection | Acti= veMQ Task > 2010-05-12 14:10:03,493 | DEBUG | jms1 removing consumer: ID:solr-2073-50= 804-1273665033782-2:0:225:1 for destination: temp-queue://ID:solr-2073-5080= 4-1273665033782-2:0:105 | org.apache.activemq.broker.region.AbstractRegion = | ActiveMQ Task > 2010-05-12 14:10:03,493 | DEBUG | remove sub: QueueSubscription: consumer= =3DID:solr-2073-50804-1273665033782-2:0:225:1, destinations=3D1, dispatched= =3D0, delivered=3D0, pending=3D0, lastDeliveredSeqId: 0, dequeues: 0, dispa= tched: 0, inflight: 0 | org.apache.activemq.broker.region.Queue | ActiveMQ = Task -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira