hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Kling (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-1476) listCorruptFileBlocks should be functional while the name node is still in safe mode
Date Wed, 17 Nov 2010 00:39:15 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Patrick Kling updated HDFS-1476:

    Attachment: HDFS-1476.patch

This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that
determines the fraction of blocks for which block reports have to be received before the NameNode
will start initializing the needed replication queues. Once a sufficient number of block reports
have been received, the queues are initialized while the NameNode is still in safe mode. After
the queues are initialized, subsequent block reports are handled by updating the queues incrementally.

The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the last few block
reports (when the NameNode is mostly idle). Once these block reports have been received, we
can then immediately leave safe mode without having to wait for the computation of the needed
replication queues (which requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have been reported.
Using this change, we could monitor if all of the missing blocks can be recreated using parity
information and if so leave safe mode early. In order for this monitoring to work, we need
access to the needed replication queues while the NameNode is still in safe mode.

The review board entry for this patch can be found at https://reviews.apache.org/r/105/ .

> listCorruptFileBlocks should be functional while the name node is still in safe mode
> ------------------------------------------------------------------------------------
>                 Key: HDFS-1476
>                 URL: https://issues.apache.org/jira/browse/HDFS-1476
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Patrick Kling
>         Attachments: HDFS-1476.patch
> This would allow us to detect whether missing blocks can be fixed using Raid and if that
is the case exit safe mode earlier.
> One way to make listCorruptFileBlocks available before the name node has exited from
safe mode would be to perform a scan of the blocks map on each call to listCorruptFileBlocks
to determine if there are any blocks with no replicas. This scan could be parallelized by
dividing the space of block IDs into multiple intervals than can be scanned independently.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message