Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 35382F21F for ; Wed, 24 Apr 2013 02:17:17 +0000 (UTC) Received: (qmail 8867 invoked by uid 500); 24 Apr 2013 02:17:17 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 8817 invoked by uid 500); 24 Apr 2013 02:17:16 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 8805 invoked by uid 99); 24 Apr 2013 02:17:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Apr 2013 02:17:16 +0000 Date: Wed, 24 Apr 2013 02:17:16 +0000 (UTC) From: "Hadoop QA (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4721) Speed up lease/block recovery when DN fails and a block goes into recovery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639993#comment-13639993 ] Hadoop QA commented on HDFS-4721: --------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580215/4721-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4303//console This message is automatically generated. > Speed up lease/block recovery when DN fails and a block goes into recovery > -------------------------------------------------------------------------- > > Key: HDFS-4721 > URL: https://issues.apache.org/jira/browse/HDFS-4721 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.0.3-alpha > Reporter: Varun Sharma > Fix For: 2.0.4-alpha > > Attachments: 4721-hadoop2.patch, 4721-trunk.patch, 4721-v2.patch, 4721-v3.patch, 4721-v4.patch, 4721-v5.patch, 4721-v6.patch, 4721-v7.patch, 4721-v8.patch > > > This was observed while doing HBase WAL recovery. HBase uses append to write to its write ahead log. So initially the pipeline is setup as > DN1 --> DN2 --> DN3 > This WAL needs to be read when DN1 fails since it houses the HBase regionserver for the WAL. > HBase first recovers the lease on the WAL file. During recovery, we choose DN1 as the primary DN to do the recovery even though DN1 has failed and is not heartbeating any more. > Avoiding the stale DN1 would speed up recovery and reduce hbase MTTR. There are two options. > a) Ride on HDFS 3703 and if stale node detection is turned on, we do not choose stale datanodes (typically not heart beated for 20-30 seconds) as primary DN(s) > b) We sort the replicas in order of last heart beat and always pick the ones which gave the most recent heart beat > Going to the dead datanode increases lease + block recovery since the block goes into UNDER_RECOVERY state even though no one is recovering it actively. Please let me know if this makes sense. If yes, whether we should move forward with a) or b). > Thanks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira