hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3391) TestPipelinesFailover#testLeaseRecoveryAfterFailover is failing
Date Fri, 11 May 2012 04:46:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273034#comment-13273034
] 

Todd Lipcon commented on HDFS-3391:
-----------------------------------

I looped TestPipelinesFailover for quite some time and could not get a failure. In the logs
you pointed to on build #2397, I traced the issue to the following:

{code}
2012-05-09 23:50:33,074 DEBUG namenode.FSNamesystem (FSEditLogLoader.java:applyEditLogOp(296))
- OP_CLOSE: /test-file numblocks : 2 clientHolder  clientMachine
2012-05-09 23:50:33,074 DEBUG blockmanagement.BlockManager (BlockManager.java:processQueuedMessages(1760))
- Processing previouly queued message ReportedBlockInfo [block=blk_-3039116449792967513_1005,
dn=127.0.0.1:45674, reportedState=FINALIZED]
2012-05-09 23:50:33,074 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1660))
- Reported block blk_-3039116449792967513_1005 on 127.0.0.1:45674 size 2048 replicaState =
FINALIZED
2012-05-09 23:50:33,074 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1684))
- In memory blockUCState = COMPLETE
2012-05-09 23:50:33,075 INFO  hdfs.StateChange (BlockManager.java:markBlockAsCorrupt(926))
- BLOCK markBlockAsCorrupt: block blk_-3039116449792967513_1005 could not be marked as corrupt
as it does not belong to any file
2012-05-09 23:50:33,076 INFO  hdfs.StateChange (InvalidateBlocks.java:add(77)) - BLOCK* InvalidateBlocks:
add blk_-3039116449792967513_1005 to 127.0.0.1:45674
2012-05-09 23:50:33,076 DEBUG blockmanagement.BlockManager (BlockManager.java:processQueuedMessages(1760))
- Processing previouly queued message ReportedBlockInfo [block=blk_-3039116449792967513_1005,
dn=127.0.0.1:35659, reportedState=FINALIZED]
2012-05-09 23:50:33,076 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1660))
- Reported block blk_-3039116449792967513_1005 on 127.0.0.1:35659 size 2048 replicaState =
FINALIZED
2012-05-09 23:50:33,076 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1684))
- In memory blockUCState = COMPLETE
2012-05-09 23:50:33,077 INFO  hdfs.StateChange (BlockManager.java:markBlockAsCorrupt(926))
- BLOCK markBlockAsCorrupt: block blk_-3039116449792967513_1005 could not be marked as corrupt
as it does not belong to any file
2012-05-09 23:50:33,077 INFO  hdfs.StateChange (InvalidateBlocks.java:add(77)) - BLOCK* InvalidateBlocks:
add blk_-3039116449792967513_1005 to 127.0.0.1:35659
2012-05-09 23:50:33,077 DEBUG blockmanagement.BlockManager (BlockManager.java:processQueuedMessages(1760))
- Processing previouly queued message ReportedBlockInfo [block=blk_-3039116449792967513_1005,
dn=127.0.0.1:59499, reportedState=FINALIZED]
2012-05-09 23:50:33,077 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1660))
- Reported block blk_-3039116449792967513_1005 on 127.0.0.1:59499 size 2048 replicaState =
FINALIZED
2012-05-09 23:50:33,078 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1684))
- In memory blockUCState = COMPLETE
2012-05-09 23:50:33,078 INFO  hdfs.StateChange (BlockManager.java:markBlockAsCorrupt(926))
- BLOCK markBlockAsCorrupt: block blk_-3039116449792967513_1005 could not be marked as corrupt
as it does not belong to any file
2012-05-09 23:50:33,078 INFO  hdfs.StateChange (InvalidateBlocks.java:add(77)) - BLOCK* InvalidateBlocks:
add blk_-3039116449792967513_1005 to 127.0.0.1:59499
2012-05-09 23:50:33,078 DEBUG blockmanagement.BlockManager (BlockManager.java:processQueuedMessages(1760))
- Processing previouly queued message ReportedBlockInfo [block=blk_-3039116449792967513_1006,
dn=127.0.0.1:45674, reportedState=FINALIZED]
2012-05-09 23:50:33,079 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1660))
- Reported block blk_-3039116449792967513_1006 on 127.0.0.1:45674 size 2048 replicaState =
FINALIZED
2012-05-09 23:50:33,079 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1684))
- In memory blockUCState = COMPLETE
2012-05-09 23:50:33,079 DEBUG blockmanagement.BlockManager (BlockManager.java:processQueuedMessages(1760))
- Processing previouly queued message ReportedBlockInfo [block=blk_-3039116449792967513_1006,
dn=127.0.0.1:59499, reportedState=FINALIZED]
2012-05-09 23:50:33,079 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1660))
- Reported block blk_-3039116449792967513_1006 on 127.0.0.1:59499 size 2048 replicaState =
FINALIZED
2012-05-09 23:50:33,080 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1684))
- In memory blockUCState = COMPLETE
2012-05-09 23:50:33,080 DEBUG blockmanagement.BlockManager (BlockManager.java:processQueuedMessages(1760))
- Processing previouly queued message ReportedBlockInfo [block=blk_-3039116449792967513_1006,
dn=127.0.0.1:35659, reportedState=FINALIZED]
2012-05-09 23:50:33,080 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1660))
- Reported block blk_-3039116449792967513_1006 on 127.0.0.1:35659 size 2048 replicaState =
FINALIZED
2012-05-09 23:50:33,080 DEBUG blockmanagement.BlockManager (BlockManager.java:processReportedBlock(1684))
- In memory blockUCState = COMPLETE
{code}

In this run, the block synchronization ended up bumping the genstamp twice (once to 1005 and
then again to 1006). When the standby is becoming active, it replays first the BlockReceived
for 1005 (out of date genstamp) and then the BlockReceived for 1006 (which should succeed
in adding it to the block map). But something fishy happens: it doesn't find that block in
the block map at all.

Given that HDFS-3157 was committed when build #2397 ran (it ran at r1336399 which was before
the revert), and HDFS-3157 touches exactly those lines of code which decide whether to mark
out-of-date genstamp blocks as corrupt, I really think it was related.

I will try reapplying HDFS-3157 locally and seeing if I can repro the issue.
                
> TestPipelinesFailover#testLeaseRecoveryAfterFailover is failing
> ---------------------------------------------------------------
>
>                 Key: HDFS-3391
>                 URL: https://issues.apache.org/jira/browse/HDFS-3391
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Arun C Murthy
>            Assignee: Todd Lipcon
>            Priority: Critical
>
> Running org.apache.hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.208 sec <<<
FAILURE!
> --
> Running org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
> Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 81.195 sec <<<
FAILURE!
> --

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message