hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12182) BlockManager.metaSave does not distinguish between "under replicated" and "missing" blocks
Date Tue, 01 Aug 2017 16:09:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109166#comment-16109166

Wei-Chiu Chuang commented on HDFS-12182:

Thanks for the new patch.

I reviewed again and saw a few nits, plus additional comments unrelated to your patch:

findbugs warnings are unrelated, caused by HDFS-11696.

if (reader != null) {
this check is not needed. If reader is ever a null pointer, it is likely caused by a failed
initialization and it should have thrown an exception. The try {} block is after the initialization
and so won’t catch it anyway.

One typo:
assertTrue("Metasave output should had …”)
“had” —> “have”

After the patch, the output of metaSave is:
Live Datanodes: 0
Dead Datanodes: 0
Metasave: Blocks waiting for reconstruction: 0
Metasave: Blocks currently missing: 1
file16387: blk_0_1 MISSING (replicas: l: 0 d: 0 c: 2 e: 0) (block deletions
maybe out of date) : (block deletions maybe out of date) : 
Mis-replicated blocks that have been postponed:
Metasave: Blocks being reconstructed: 0
Metasave: Blocks 0 waiting deletion from 0 datanodes.
Corrupt Blocks:
Block=0	Node=	StorageID=s1	StorageState=NORMAL	TotalReplicas=2	Reason=GENSTAMP_MISMATCH
Block=0	Node=	StorageID=s2	StorageState=NORMAL	TotalReplicas=2	Reason=GENSTAMP_MISMATCH
Metasave: Number of datanodes: 0

(the following is unrelated to this jira)
Looking at the output
The output is not user friendly — The meaning of “(replicas: l: 0 d: 0 c: 2 e: 0)” is
not obvious without looking at the code.
Also, it should print maintenance mode replicas.

> BlockManager.metaSave does not distinguish between "under replicated" and "missing" blocks
> ------------------------------------------------------------------------------------------
>                 Key: HDFS-12182
>                 URL: https://issues.apache.org/jira/browse/HDFS-12182
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Trivial
>              Labels: newbie
>             Fix For: 3.0.0-alpha3
>         Attachments: HDFS-12182.001.patch, HDFS-12182.002.patch, HDFS-12182.003.patch
> Currently, *BlockManager.metaSave* method (which is called by "-metasave" dfs CLI command)
reports both "under replicated" and "missing" blocks under same metric *Metasave: Blocks waiting
for reconstruction:* as shown on below code snippet:
> {noformat}
>    synchronized (neededReconstruction) {
>       out.println("Metasave: Blocks waiting for reconstruction: "
>           + neededReconstruction.size());
>       for (Block block : neededReconstruction) {
>         dumpBlockMeta(block, out);
>       }
>     }
> {noformat}
> *neededReconstruction* is an instance of *LowRedundancyBlocks*, which actually wraps
5 priority queues currently. 4 of these queues store different under replicated scenarios,
but the 5th one is dedicated for corrupt/missing blocks. 
> Thus, metasave report may suggest some corrupt blocks are just under replicated. This
can be misleading for admins and operators trying to track block missing/corruption issues,
and/or other issues related to *BlockManager* metrics.
> I would like to propose a patch with trivial changes that would report corrupt blocks

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message