Mailing-List: contact dev-help@hc.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "HttpComponents Project" <dev@hc.apache.org>
Received-SPF: pass (athena.apache.org: domain of eric.hubert@jamba.net
 designates 81.19.200.43 as permitted sender)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: Possible Causes for "Connection reset by peer" when using NIO
Date: Wed, 25 Jun 2008 18:13:45 +0200
Message-ID: 
 <A9856B86EDA25144A734574D0338C135EF02E3@berwnexmb01.jcorp.ad.jamba.net>
Thread-Topic: Possible Causes for "Connection reset by peer" when using NIO
Thread-Index: AcjW3nE84KgaA+i6S8OhDVN7lufHNw==
From: "Hubert, Eric" <eric.hubert@jamba.net>
To: <dev@synapse.apache.org>
Cc: <dev@hc.apache.org>

Hi devs!

first of all I'd like to apologize for posting a "user-problem" to two =
dev-lists. I only did this as have not much background knowledge of the =
NIO implementation and think a solid understanding of NIO is necessary =
to help tackling our problem.

We are using the WSO2 ESB which is based on Apache Synapse, Apache Axis2 =
and the HTTP Core NIO module. As the stacktrace only contains http-nio =
details, I cc'ed the http components dev list. Hopefully someone can =
help out.

When sending about 3000 Hessian-requests per hour from clients (Tomcat) =
over the ESB (Synapse 1.2 running on JDK 1.5.15, Linux =
2.6.23.1-amd64-75) to a Bea Weblogic 8.1 we see about 1 to 10 exceptions =
of type "java.io.IOException: Connection reset by peer" in the ESB-log.=20

If I understand it right the ESB then executes a failover to the next =
service node as we are using a load balancing group. So the client is =
not affected, but the endpoint with the failure will be marked as =
inactive.

The problem is I don't understand the cause of this exception. It occurs =
during the read on a Socket-Channel. So I think the server might close =
the connection while the ESB is reading. But maybe internally some kind =
of pool is used and a connection can change to some abnormal state?

We have seen such Exceptions before when we were using HTTP 1.1 in =
combination with the Bea Weblogic server. Very likely an issue with HTTP =
keepalive (persistent connections). So for any connection to a Bea =
service we use the property mediator of Synapse to change the connection =
ESB <-> Bea to use HTTP 1.0:
<syn:property name=3D"FORCE_HTTP_1.0" value=3D"true" scope=3D"axis2" />

Since then we hadn't seen this exception again. But now switching to =
another environment we see this exception again, but only for Hessian =
services.
I have no clue what else could cause this exception. How can we detect =
the cause? How to narrow down possible causes, if there are different =
possibilities. I don't expect any network outages to be the reason, as =
other services (SOAP)-based are working pretty well.

The exact exception we are getting is:

java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
        at =
org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInput=
BufferImpl.java:85)
        at =
org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(Abstract=
MessageParser.java:97)
        at =
org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(Defaul=
tNHttpClientConnection.java:113)
        at =
org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultC=
lientIOEventDispatch.java:99)
        at =
org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.jav=
a:98)
        at =
org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractI=
OReactor.java:195)
        at =
org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(Abstract=
IOReactor.java:180)
        at =
org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReac=
tor.java:142)
        at =
org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java=
:70)
        at =
org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(=
AbstractMultiworkerIOReactor.java:318)=A0


This exception occurs consistently a few time per hour on every possible =
combination of client node, esb node and service endpoint node.

Any pointer or idea is greatly appreciated. Thanks a lot in advance!


Regards,
   Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org