hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10887) Provide admin/debug tool to dump block map
Date Thu, 22 Sep 2016 17:14:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513845#comment-15513845

Yongjun Zhang commented on HDFS-10887:

Hi [~kihwal],

Thanks a lot for your input, and very helpful info!

Some questions:
1) If include/exclude file is being used, we can tell whether all nodes have registered and
heartbeated. This gives list of dead nodes, which should have been in service.
I agree this is useful data. NN waits 10.5 minutes before declaring a DN dead. Before that,
if we want to know what DNs are lagging, what I was thinking was: once we know the blocks
that have fewer than minRepl replicas, we can search in all DN's block files for these blocks,
to see what DNs have the blocks, and whether there is abnormality going on there.

2) For the nodes that have heartbeated, we will be able to tell whether a block report was
received for all storage volumes.
May I know how you usually look at to see if a full block is received from a DN, and how to
see if an incremental report is received from a DN?

 I would force the namenode out of safe mode. That causes replication queue initialization
and will show missing blocks.
This is helpful. One concern of forcing NN out of safemode too early is, if client starts
reading blocks that are missing, client will get missing block error instead of safemode exception,
which may be handled differently at client side.  Right?


> Provide admin/debug tool to dump block map
> ------------------------------------------
>                 Key: HDFS-10887
>                 URL: https://issues.apache.org/jira/browse/HDFS-10887
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs, namenode
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-10887.001.patch
> From time to time, when NN restarts, we see
> {code}
> "The reported blocks X needs additional Y blocks to reach the threshold 0.9990 of total
blocks Z. Safe mode will be turned off automatically.
> {code}
> We'd wonder what these blocks that still need block reports are, and what DNs they could
possibly be located, what happened to these DNs.
> This jira to to propose a new admin or debug tool to dump the block map info with the
blocks that have fewer than minRepl replicas.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message