Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 86008 invoked from network); 5 Mar 2011 05:14:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Mar 2011 05:14:13 -0000 Received: (qmail 29036 invoked by uid 500); 5 Mar 2011 05:14:13 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 28985 invoked by uid 500); 5 Mar 2011 05:14:12 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 28967 invoked by uid 99); 5 Mar 2011 05:14:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Mar 2011 05:14:12 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Mar 2011 05:14:07 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 3EE2956001 for ; Sat, 5 Mar 2011 05:13:46 +0000 (UTC) Date: Sat, 5 Mar 2011 05:13:46 +0000 (UTC) From: "dhruba borthakur (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <2023304884.650.1299302026253.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <14977504.151521292473861668.JavaMail.jira@thor> Subject: [jira] Commented: (HDFS-1541) Not marking datanodes dead When namenode in safemode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002929#comment-13002929 ] dhruba borthakur commented on HDFS-1541: ---------------------------------------- I am thinking that the namenode should not mark datanodes as dead if the namenode is in safemode, irrespective of whether it is in startup-safemode or in manual-safemode. My reasoning is as follows: A couple of times, we have had failures of a few set of racks. when this happened, we put the namenode in safemode to prevent a replication storm. When the namenode loses a large chunk of datanodes, it has to spend lots of cpu resources in processing blockreports when the partitioned datanodes start rejoining the cluster; at this time it is better if we can prevent the datanodes from timing out, or else the storm of block reports causes other datanodes to timeout resulting in a never-ending cycle. > Not marking datanodes dead When namenode in safemode > ---------------------------------------------------- > > Key: HDFS-1541 > URL: https://issues.apache.org/jira/browse/HDFS-1541 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Affects Versions: 0.23.0 > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Fix For: 0.23.0 > > Attachments: deadnodescheck.patch > > > In a big cluster, when namenode starts up, it takes a long time for namenode to process block reports from all datanodes. Because heartbeats processing get delayed, some datanodes are erroneously marked as dead, then later on they have to register again, thus wasting time. > It would speed up starting time if the checking of dead nodes is disabled when namenode in safemode. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira