Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 135D5200BF0 for ; Fri, 30 Dec 2016 23:39:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 11EFE160B3F; Fri, 30 Dec 2016 22:39:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5DD7F160B19 for ; Fri, 30 Dec 2016 23:38:59 +0100 (CET) Received: (qmail 78210 invoked by uid 500); 30 Dec 2016 22:38:58 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 78188 invoked by uid 99); 30 Dec 2016 22:38:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Dec 2016 22:38:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 66D922C1F5A for ; Fri, 30 Dec 2016 22:38:58 +0000 (UTC) Date: Fri, 30 Dec 2016 22:38:58 +0000 (UTC) From: "Zheng Shao (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-11280) Allow WebHDFSClient to reuse HTTP connections (HTTP Keep-Alive) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 30 Dec 2016 22:39:00 -0000 [ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HDFS-11280: ------------------------------ Attachment: HDFS-11280.for.2.8.and.beyond.4.patch Replaced the other instance of conn.disconnect(). Also added comment for why the 3rd instance doesn't need to be replaced. > Allow WebHDFSClient to reuse HTTP connections (HTTP Keep-Alive) > --------------------------------------------------------------- > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 > Reporter: Zheng Shao > Assignee: Zheng Shao > Attachments: HDFS-11280.for.2.7.and.below.patch, HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. When we use webhdfs as the source in distcp, this used up all ephemeral ports on the client side since all closed connections continue to occupy the port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call conn.getInputStream().close() instead to make sure the connection is kept alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org