Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EC95BD965 for ; Thu, 13 Sep 2012 06:41:13 +0000 (UTC) Received: (qmail 81667 invoked by uid 500); 13 Sep 2012 06:41:13 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 81369 invoked by uid 500); 13 Sep 2012 06:41:11 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 81312 invoked by uid 99); 13 Sep 2012 06:41:10 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 06:41:10 +0000 Date: Thu, 13 Sep 2012 17:41:10 +1100 (NCT) From: "Hudson (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1983284387.73672.1347518470133.JavaMail.jiratomcat@arcas> In-Reply-To: <541368796.90543.1343041414929.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (HDFS-3703) Decrease the datanode failure detection time MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454687#comment-13454687 ] Hudson commented on HDFS-3703: ------------------------------ Integrated in Hadoop-Mapreduce-trunk-Commit #2752 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2752/]) HDFS-3703. Datanodes are marked stale if heartbeat is not received in configured timeout and are selected as the last location to read from. Contributed by Jing Zhao. (Revision 1384209) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1384209 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetBlocks.java > Decrease the datanode failure detection time > -------------------------------------------- > > Key: HDFS-3703 > URL: https://issues.apache.org/jira/browse/HDFS-3703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, name-node > Affects Versions: 1.0.3, 2.0.0-alpha, 3.0.0 > Reporter: nkeywal > Assignee: Jing Zhao > Fix For: 3.0.0, 2.0.3-alpha > > Attachments: 3703-hadoop-1.0.txt, HDFS-3703-branch2.patch, HDFS-3703.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-with-write.patch > > > By default, if a box dies, the datanode will be marked as dead by the namenode after 10:30 minutes. In the meantime, this datanode will still be proposed by the nanenode to write blocks or to read replicas. It happens as well if the datanode crashes: there is no shutdown hooks to tell the nanemode we're not there anymore. > It especially an issue with HBase. HBase regionserver timeout for production is often 30s. So with these configs, when a box dies HBase starts to recover after 30s and, while 10 minutes, the namenode will consider the blocks on the same box as available. Beyond the write errors, this will trigger a lot of missed reads: > - during the recovery, HBase needs to read the blocks used on the dead box (the ones in the 'HBase Write-Ahead-Log') > - after the recovery, reading these data blocks (the 'HBase region') will fail 33% of the time with the default number of replica, slowering the data access, especially when the errors are socket timeout (i.e. around 60s most of the time). > Globally, it would be ideal if HDFS settings could be under HBase settings. > As a side note, HBase relies on ZooKeeper to detect regionservers issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira