hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1541) Not marking datanodes dead When namenode in safemode
Date Sat, 05 Mar 2011 05:13:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002929#comment-13002929
] 

dhruba borthakur commented on HDFS-1541:
----------------------------------------

I am thinking that the namenode should not mark datanodes as dead if the namenode is in safemode,
irrespective of whether it is in startup-safemode or in manual-safemode. My reasoning is as
follows:

A couple of times, we have had failures of a few set of racks. when this happened, we put
the namenode in safemode to prevent a replication storm. When the namenode loses a large chunk
of datanodes, it has to spend lots of cpu resources in processing blockreports when the partitioned
datanodes start rejoining the cluster; at this time it is better if we can prevent the datanodes
from timing out, or else the storm of block reports causes other datanodes to timeout resulting
in a never-ending cycle.

> Not marking datanodes dead When namenode in safemode
> ----------------------------------------------------
>
>                 Key: HDFS-1541
>                 URL: https://issues.apache.org/jira/browse/HDFS-1541
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.23.0
>
>         Attachments: deadnodescheck.patch
>
>
> In a big cluster, when namenode starts up,  it takes a long time for namenode to process
block reports from all datanodes. Because heartbeats processing get delayed, some datanodes
are erroneously marked as dead, then later on they have to register again, thus wasting time.
> It would speed up starting time if the checking of dead nodes is disabled when namenode
in safemode.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message