hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1476) listCorruptFileBlocks should be functional while the name node is still in safe mode
Date Tue, 16 Nov 2010 20:08:15 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932632#action_12932632

dhruba borthakur commented on HDFS-1476:

Thinking more about this one,  we can exit safemode faster if we can compute misReplicatedBlocks
even before we have one replica of all blocks.

Step 1: the namenode waits to ensure that there is at least one replica of all known blocks.
Step 2: Then it invokes processMisReplicatedBlocks to update neededReplication

When the cluster restarts, the namenode starts in Step 1 and starts processing a storm of
block reports from all datanodes. But a few datanodes are somewhat slow and the block report
from the straggler datanodes delays the transition from Step 1 to Step 2. The CPU usage on
the NN decreases exponentially as Step 1 progresses and becomes almost negligible when Step
1 is about to end.

This jira could change the code so that processMisReplicatedBlocks is invoked before Step
1 finishes completely. This will make the NN exit safemode earlier

> listCorruptFileBlocks should be functional while the name node is still in safe mode
> ------------------------------------------------------------------------------------
>                 Key: HDFS-1476
>                 URL: https://issues.apache.org/jira/browse/HDFS-1476
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Patrick Kling
> This would allow us to detect whether missing blocks can be fixed using Raid and if that
is the case exit safe mode earlier.
> One way to make listCorruptFileBlocks available before the name node has exited from
safe mode would be to perform a scan of the blocks map on each call to listCorruptFileBlocks
to determine if there are any blocks with no replicas. This scan could be parallelized by
dividing the space of block IDs into multiple intervals than can be scanned independently.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message