hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2159) Namenode stuck in safemode
Date Wed, 21 May 2008 22:33:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598833#action_12598833
] 

Hairong Kuang commented on HADOOP-2159:
---------------------------------------

 Name Node keeps track of the total number of valid block it received in safe mode. A valid
block is a block that belongs to a file. The counter is called blockSafe.  The name node does
not leave the safe mode automatically if the ratio of blockSafe to the total number of valid
blocks is less the threshold.

 I see a bug in maintaining this counter. Before the counter is incremented, the name node
check if the block is valid.  Before it does not do the check before this counter is decremented.

When a dfs cluster is started, if an early started data node has stale blocks, the name node
will ask the data node to delete the stale blocks as the reply to its first block report.
If its second block report comes in when the name node is still in safe mode, those blocks
will be removed from the blocks map, and the blockSafe counter will also be decremented even
though those blocks are invalid. So the cluster will end up with a blockSafe counter that's
smaller than the number of valid blocks in namenode. If the threshold is set to be 1, the
cluster will not be able to leave the safe mode.  

> Namenode stuck in safemode
> --------------------------
>
>                 Key: HADOOP-2159
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2159
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>
> Occasionally (not easy to reproduce) the namenode does turn off safemode automatically,
although fsck does not report any missing or under-replicated blocks (safemode threshold set
to 1.0).
> At this moment I do not have any additional information which could help analyze the
issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message