Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69D5E18708 for ; Wed, 20 Jan 2016 08:59:40 +0000 (UTC) Received: (qmail 88939 invoked by uid 500); 20 Jan 2016 08:59:40 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 88884 invoked by uid 500); 20 Jan 2016 08:59:40 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 88871 invoked by uid 99); 20 Jan 2016 08:59:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2016 08:59:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D09752C1F55 for ; Wed, 20 Jan 2016 08:59:39 +0000 (UTC) Date: Wed, 20 Jan 2016 08:59:39 +0000 (UTC) From: "Elliott Clark (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9669: -------------------------------- Attachment: HDFS-9669.0.patch Straight forward patch to make sure that all the places that bind use the listen backlog setting. > TcpPeerServer should respect ipc.server.listen.queue.size > --------------------------------------------------------- > > Key: HDFS-9669 > URL: https://issues.apache.org/jira/browse/HDFS-9669 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Elliott Clark > Assignee: Elliott Clark > Attachments: HDFS-9669.0.patch > > > On periods of high traffic we are seeing: > {code} > 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect to /10.138.178.47:50010 for file /MYPATH/MYFILE for block BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: Connection reset by peer > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) > at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > {code} > At the time that this happens there are way less xceivers than configured. > On most JDK's this will make 50 the total backlog at any time. This effectively means that any GC + Busy time willl result in tcp resets. > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)