hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hubert, Eric" <eric.hub...@jamba.net>
Subject Possible Causes for "Connection reset by peer" when using NIO
Date Wed, 25 Jun 2008 16:13:45 GMT
Hi devs!

first of all I'd like to apologize for posting a "user-problem" to two dev-lists. I only did
this as have not much background knowledge of the NIO implementation and think a solid understanding
of NIO is necessary to help tackling our problem.

We are using the WSO2 ESB which is based on Apache Synapse, Apache Axis2 and the HTTP Core
NIO module. As the stacktrace only contains http-nio details, I cc'ed the http components
dev list. Hopefully someone can help out.

When sending about 3000 Hessian-requests per hour from clients (Tomcat) over the ESB (Synapse
1.2 running on JDK 1.5.15, Linux to a Bea Weblogic 8.1 we see about 1 to
10 exceptions of type "java.io.IOException: Connection reset by peer" in the ESB-log. 

If I understand it right the ESB then executes a failover to the next service node as we are
using a load balancing group. So the client is not affected, but the endpoint with the failure
will be marked as inactive.

The problem is I don't understand the cause of this exception. It occurs during the read on
a Socket-Channel. So I think the server might close the connection while the ESB is reading.
But maybe internally some kind of pool is used and a connection can change to some abnormal

We have seen such Exceptions before when we were using HTTP 1.1 in combination with the Bea
Weblogic server. Very likely an issue with HTTP keepalive (persistent connections). So for
any connection to a Bea service we use the property mediator of Synapse to change the connection
ESB <-> Bea to use HTTP 1.0:
<syn:property name="FORCE_HTTP_1.0" value="true" scope="axis2" />

Since then we hadn't seen this exception again. But now switching to another environment we
see this exception again, but only for Hessian services.
I have no clue what else could cause this exception. How can we detect the cause? How to narrow
down possible causes, if there are different possibilities. I don't expect any network outages
to be the reason, as other services (SOAP)-based are working pretty well.

The exact exception we are getting is:

java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
        at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:85)
        at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:97)
        at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:113)
        at org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:99)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:98)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:195)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:180)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:142)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:70)
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:318) 

This exception occurs consistently a few time per hour on every possible combination of client
node, esb node and service endpoint node.

Any pointer or idea is greatly appreciated. Thanks a lot in advance!


To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

View raw message