hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.
Date Wed, 21 Mar 2012 05:11:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234104#comment-13234104

Uma Maheswara Rao G commented on HDFS-3122:

I reproduced this case with the debug points.

1) created a file and hsync'ed that file.
2) triggered on BR in separate thread and blocked this call in NN side just before aquiring
the fsnamesystem lock.
3) triggered one recoverlease call from separate thread and completed the call.
4) after successfully completed #3 (after commitBlockSynchronization with new genstamp), started
processing the blocked BR in #2.
5) since that old BR has older genstamp, that block is getting marked as corrupt.

will attach the colored logs. 

> Block recovery with closeFile flag true can race with blockReport. Due to this blocks
are getting marked as corrupt.
> --------------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-3122
>                 URL: https://issues.apache.org/jira/browse/HDFS-3122
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, name-node
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>            Priority: Critical
> *Block Report* can *race* with *Block Recovery* with closeFile flag true.
>  IF block report generated just befor recovery at DN side and due to N/W. This block
report got delayed to NN. Recovery success and generation stamp has been changed to new one.

> primary DN invokes the commitBlockSynchronization and block got updated in NN side. Also
marked as complete, since the closeFile flag is true. Updated with new genstamp.
> Now blockReport started processing at NN side. This particular block from RBW (when it
generated the BR at DN), and file was completed at NN side.
> Since the genartion stamps are mismatching, block is getting marked as corrupt.
> {code}
>  case RWR:
>       if (!storedBlock.isComplete()) {
>         return null; // not corrupt
>       } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) {
>         return new BlockToMarkCorrupt(storedBlock,
>             "reported " + reportedState + " replica with genstamp " +
>             iblk.getGenerationStamp() + " does not match COMPLETE block's " +
>             "genstamp in block map " + storedBlock.getGenerationStamp());
>       } else { // COMPLETE block, same genstamp
> {code}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message