Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E27210926 for ; Fri, 25 Jul 2014 18:51:43 +0000 (UTC) Received: (qmail 92495 invoked by uid 500); 25 Jul 2014 18:51:42 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 92418 invoked by uid 500); 25 Jul 2014 18:51:42 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 92255 invoked by uid 99); 25 Jul 2014 18:51:42 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jul 2014 18:51:42 +0000 Date: Fri, 25 Jul 2014 18:51:42 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-6755) There is an unnecessary sleep in the code path where DFSOutputStream#close gives up its attempt to contact the namenode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6755: --------------------------------------- Description: DFSOutputStream#close has a loop where it tries to contact the NameNode, to call {{complete}} on the file which is open-for-write. This loop includes a sleep which increases exponentially (exponential backoff). It makes sense to sleep before re-contacting the NameNode, but the code also sleeps even in the case where it has already decided to give up and throw an exception back to the user. It should not sleep after it has already decided to give up, since there's no point. (was: Following code in DFSOutputStream may have an unnecessary sleep. {code} try { Thread.sleep(localTimeout); if (retries == 0) { throw new IOException("Unable to close file because the last block" + " does not have enough number of replicas."); } retries--; localTimeout *= 2; if (Time.now() - localstart > 5000) { DFSClient.LOG.info("Could not complete " + src + " retrying..."); } } catch (InterruptedException ie) { DFSClient.LOG.warn("Caught exception ", ie); } {code} Currently, the code sleeps before throwing an exception which should not be the case. The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations and it would sleep just to throw an exception. We need to move the sleep down after decrementing retries.) > There is an unnecessary sleep in the code path where DFSOutputStream#close gives up its attempt to contact the namenode > ----------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-6755 > URL: https://issues.apache.org/jira/browse/HDFS-6755 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.6.0 > Reporter: Mit Desai > Assignee: Mit Desai > Attachments: HDFS-6755.patch > > > DFSOutputStream#close has a loop where it tries to contact the NameNode, to call {{complete}} on the file which is open-for-write. This loop includes a sleep which increases exponentially (exponential backoff). It makes sense to sleep before re-contacting the NameNode, but the code also sleeps even in the case where it has already decided to give up and throw an exception back to the user. It should not sleep after it has already decided to give up, since there's no point. -- This message was sent by Atlassian JIRA (v6.2#6252)