hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10718) "IOException: An existing connection was forcibly closed by the remote host" frequently happens on Windows
Date Wed, 18 Jun 2014 18:02:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036051#comment-14036051
] 

Daryn Sharp commented on HADOOP-10718:
--------------------------------------

I've seen a jira with similar odd windows tcp connection issues.  The error about "forcibly
closed" is the normal graceful tcp shutdown (FIN) did not occur but a hard connection abort
(RESET).  I tried to read up the windows tcp stack and found that close() immediately frees
all resources.  If there is data remaining to be sent then it's discarded and a RESET is sent.
 The shutdown() call is supposed to initiate the graceful FIN shutdown.

The ipc layer is doing shutdown + close but apparently windows isn't behaving correctly. 
I suspect that reason the errors are non-deterministic is the server thread is not being context
switched between the write ... close.  The client thread never got a chance to read the response.

It'd be curious to know if windows sent both the FIN and the RESET.  Someone with windows
should get a packet trace.

> "IOException: An existing connection was forcibly closed by the remote host" frequently
happens on Windows
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-10718
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10718
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: Zhijie Shen
>
> After HADOOP-317, we still observed that on windows platform, there're a number of IOException:
An existing connection was forcibly closed by the remote host when running a MR job. For example,
> {code}
> 2014-06-09 09:11:40,675 INFO [Socket Reader #3 for port 59622] org.apache.hadoop.ipc.Server:
Socket Reader #3 for port 59622: readAndProcess from client 10.215.30.53 threw exception [java.io.IOException:
An existing connection was forcibly closed by the remote host]
> java.io.IOException: An existing connection was forcibly closed by the remote host
> 	at sun.nio.ch.SocketDispatcher.read0(Native Method)
> 	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
> 	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
> 	at sun.nio.ch.IOUtil.read(IOUtil.java:198)
> 	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
> 	at org.apache.hadoop.ipc.Server.channelRead(Server.java:2558)
> 	at org.apache.hadoop.ipc.Server.access$2800(Server.java:130)
> 	at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1459)
> 	at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750)
> 	at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624)
> 	at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595)
> {code}
> {code}
> 2014-06-09 09:15:38,539 WARN [main] org.apache.hadoop.mapred.Task: Failure sending commit
pending: java.io.IOException: Failed on local exception: java.io.IOException: An existing
connection was forcibly closed by the remote host; Host Details : local host is: "sdevin-clster53/10.215.16.72";
destination host is: "sdevin-clster54":63415; 
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1414)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> 	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
> 	at com.sun.proxy.$Proxy9.commitPending(Unknown Source)
> 	at org.apache.hadoop.mapred.Task.done(Task.java:1006)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.io.IOException: An existing connection was forcibly closed by the remote
host
> 	at sun.nio.ch.SocketDispatcher.read0(Native Method)
> 	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
> 	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
> 	at sun.nio.ch.IOUtil.read(IOUtil.java:198)
> 	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
> 	at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:133)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:133)
> 	at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:510)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> 	at java.io.DataInputStream.readInt(DataInputStream.java:387)
> 	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1054)
> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:949)
> {code}
> And the latter one results in the issue of MAPREDUCE-5924.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message