Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5ED61200BF1 for ; Tue, 3 Jan 2017 07:47:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5D62B160B43; Tue, 3 Jan 2017 06:47:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A3D3A160B33 for ; Tue, 3 Jan 2017 07:46:59 +0100 (CET) Received: (qmail 82771 invoked by uid 500); 3 Jan 2017 06:46:58 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 82721 invoked by uid 99); 3 Jan 2017 06:46:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jan 2017 06:46:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 83C7B2C1F54 for ; Tue, 3 Jan 2017 06:46:58 +0000 (UTC) Date: Tue, 3 Jan 2017 06:46:58 +0000 (UTC) From: "Zheng Shao (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11280) Allow WebHDFS to reuse HTTP connections to NN MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 03 Jan 2017 06:47:00 -0000 [ https://issues.apache.org/jira/browse/HDFS-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794319#comment-15794319 ] Zheng Shao commented on HDFS-11280: ----------------------------------- Sorry guys, I think we need to revert this patch. My initial understanding is that the HTTP Keep-Alive is breaking some assumptions in the WebHDFS code. [~wheat9] can you help revert this? I will take a second look at the breaking Tests. > Allow WebHDFS to reuse HTTP connections to NN > --------------------------------------------- > > Key: HDFS-11280 > URL: https://issues.apache.org/jira/browse/HDFS-11280 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 2.7.3, 2.6.5, 3.0.0-alpha1 > Reporter: Zheng Shao > Assignee: Zheng Shao > Fix For: 2.8.0, 2.9.0, 2.7.4, 3.0.0-alpha2 > > Attachments: HDFS-11280.for.2.7.and.below.patch, HDFS-11280.for.2.8.and.beyond.2.patch, HDFS-11280.for.2.8.and.beyond.3.patch, HDFS-11280.for.2.8.and.beyond.4.patch, HDFS-11280.for.2.8.and.beyond.patch > > > WebHDFSClient calls "conn.disconnect()", which disconnects from the NameNode. When we use webhdfs as the source in distcp, this used up all ephemeral ports on the client side since all closed connections continue to occupy the port with TIME_WAIT status for some time. > According to http://tinyurl.com/java7-http-keepalive, we should call conn.getInputStream().close() instead to make sure the connection is kept alive. This will get rid of the ephemeral port problem. > Manual steps used to verify the bug fix: > 1. Build original hadoop jar. > 2. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | grep -c 50070" on the local machine shows a big number (100s). > 3. Build hadoop jar with this diff. > 4. Try out distcp from webhdfs as source, and "netstat -n | grep TIME_WAIT | grep -c 50070" on the local machine shows 0. > 5. The explanation: distcp's client side does a lot of directory scanning, which would create and close a lot of connections to the namenode HTTP port. > Reference: > 2.7 and below: https://github.com/apache/hadoop/blob/branch-2.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L743 > 2.8 and above: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L898 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org