Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 25 Jul 2014 18:51:42 +0000 (UTC)
From: "Colin Patrick McCabe (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12729763.1406305913625.46669.1406314302637@arcas>
In-Reply-To: <JIRA.12729763.1406305913625@arcas>
References: <JIRA.12729763.1406305913625@arcas>
Subject: [jira] [Updated] (HDFS-6755) There is an unnecessary sleep in the
 code path where DFSOutputStream#close gives up its attempt to contact the
 namenode
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Colin Patrick McCabe updated HDFS-6755:
---------------------------------------

    Description: DFSOutputStream#close has a loop where it tries to contact the NameNode, to call {{complete}} on the file which is open-for-write.  This loop includes a sleep which increases exponentially (exponential backoff).  It makes sense to sleep before re-contacting the NameNode, but the code also sleeps even in the case where it has already decided to give up and throw an exception back to the user.  It should not sleep after it has already decided to give up, since there's no point.  (was: Following code in DFSOutputStream may have an unnecessary sleep.

{code}
try {
          Thread.sleep(localTimeout);
          if (retries == 0) {
            throw new IOException("Unable to close file because the last block"
                + " does not have enough number of replicas.");
          }
          retries--;
          localTimeout *= 2;
          if (Time.now() - localstart > 5000) {
            DFSClient.LOG.info("Could not complete " + src + " retrying...");
          }
        } catch (InterruptedException ie) {
          DFSClient.LOG.warn("Caught exception ", ie);
        }
{code}

Currently, the code sleeps before throwing an exception which should not be the case.
The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations and it would sleep just to throw an exception. We need to move the sleep down after decrementing retries.)

> There is an unnecessary sleep in the code path where DFSOutputStream#close gives up its attempt to contact the namenode
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6755
>                 URL: https://issues.apache.org/jira/browse/HDFS-6755
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.6.0
>            Reporter: Mit Desai
>            Assignee: Mit Desai
>         Attachments: HDFS-6755.patch
>
>
> DFSOutputStream#close has a loop where it tries to contact the NameNode, to call {{complete}} on the file which is open-for-write.  This loop includes a sleep which increases exponentially (exponential backoff).  It makes sense to sleep before re-contacting the NameNode, but the code also sleeps even in the case where it has already decided to give up and throw an exception back to the user.  It should not sleep after it has already decided to give up, since there's no point.


--
This message was sent by Atlassian JIRA
(v6.2#6252)