Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Wed, 20 Jan 2016 08:59:39 +0000 (UTC)
From: "Elliott Clark (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12932644.1453276124000.156339.1453280379850@Atlassian.JIRA>
In-Reply-To: <JIRA.12932644.1453276124000@Atlassian.JIRA>
References: <JIRA.12932644.1453276124000@Atlassian.JIRA>
 <JIRA.12932644.1453276124029@arcas>
Subject: [jira] [Updated] (HDFS-9669) TcpPeerServer should respect
 ipc.server.listen.queue.size
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Elliott Clark updated HDFS-9669:
--------------------------------
    Attachment: HDFS-9669.0.patch

Straight forward patch to make sure that all the places that bind use the listen backlog setting.

> TcpPeerServer should respect ipc.server.listen.queue.size
> ---------------------------------------------------------
>
>                 Key: HDFS-9669
>                 URL: https://issues.apache.org/jira/browse/HDFS-9669
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>         Attachments: HDFS-9669.0.patch
>
>
> On periods of high traffic we are seeing:
> {code}
> 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect to /10.138.178.47:50010 for file /MYPATH/MYFILE for block BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: Connection reset by peer
> java.io.IOException: Connection reset by peer
> 	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> 	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> 	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> 	at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> 	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> 	at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> 	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> 	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> 	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
> 	at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
> {code}
> At the time that this happens there are way less xceivers than configured.
> On most JDK's this will make 50 the total backlog at any time. This effectively means that any GC + Busy time willl result in tcp resets.
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)