hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kitti Nanasi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13999) Bogus missing block warning if the file is under construction when NN starts
Date Wed, 17 Oct 2018 11:35:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653397#comment-16653397

Kitti Nanasi commented on HDFS-13999:

Thanks [~jojochuang] for reporting the issue!

The fix looks good. Do you plan to add a unit test?

> Bogus missing block warning if the file is under construction when NN starts
> ----------------------------------------------------------------------------
>                 Key: HDFS-13999
>                 URL: https://issues.apache.org/jira/browse/HDFS-13999
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: HDFS-13999.branch-2.7.001.patch, webui missing blocks.png
> We found an interesting case where web UI displays a few missing blocks, but it doesn't
state which files are corrupt. What'll also happen is that fsck states the file system is
healthy. This bug is similar to HDFS-10827 and HDFS-8533. 
>  (See the attachment for an example)
> Using Dynamometer, I was able to reproduce the bug, and realized the the "missing" blocks
are actually healthy, but somehow neededReplications doesn't get updated when NN receives
block reports. What's more interesting is that the files associated with the "missing" blocks
are under construction when NN starts, and so after a while NN prints file recovery log.
> Given that, I determined the following code is the source of bug:
> {code:java|title=BlockManager#addStoredBlock}
> ....
>    // if file is under construction, then done for now
>     if (bc.isUnderConstruction()) {
>       return storedBlock;
>     }
> {code}
> which is wrong, because a file may have multiple blocks, and the first block is complete.
In which case, the neededReplications structure doesn't get updated for the first block, and
thus the missing block warning on the web UI. More appropriately, it should check the state
of the block itself, not the file.
> Fortunately, it was unintentionally fixed via HDFS-9754:
> {code:java}
>     // if block is still under construction, then done for now
>     if (!storedBlock.isCompleteOrCommitted()) {
>       return storedBlock;
>     }
> {code}
> We should bring this fix into branch-2.7 too. That said, this is a harmless warning,
and should go away after the under-construction-files are recovered, and the NN restarts (or
force full block reports).
> Kudos to Dynamometer! It would be impossible to reproduce this bug without the tool.
And thanks [~smeng] for helping with the reproduction.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message