Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 08FAD18496 for ; Wed, 20 Jan 2016 07:49:41 +0000 (UTC) Received: (qmail 76784 invoked by uid 500); 20 Jan 2016 07:49:40 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 76643 invoked by uid 500); 20 Jan 2016 07:49:40 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 76489 invoked by uid 99); 20 Jan 2016 07:49:39 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2016 07:49:39 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C21FD2C1F55 for ; Wed, 20 Jan 2016 07:49:39 +0000 (UTC) Date: Wed, 20 Jan 2016 07:49:39 +0000 (UTC) From: "Elliott Clark (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Elliott Clark created HDFS-9669: ----------------------------------- Summary: TcpPeerServer should respect ipc.server.listen.queue.size Key: HDFS-9669 URL: https://issues.apache.org/jira/browse/HDFS-9669 Project: Hadoop HDFS Issue Type: Bug Reporter: Elliott Clark On periods of high traffic we are seeing: {code} 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect to /10.138.178.47:50010 for file /MYPATH/MYFILE for block BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: Connection reset by peer java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109) at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) {code} At the time that this happens there are way less xceivers than configured. On most JDK's this will make 50 the total backlog at any time. This effectively means that any GC + Busy time willl result in tcp resets. http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370 -- This message was sent by Atlassian JIRA (v6.3.4#6332)