hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ikweesung (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5809) BlockPoolSliceScanner make datanode to drop into infinite loop
Date Fri, 24 Jan 2014 04:15:37 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880700#comment-13880700
] 

ikweesung commented on HDFS-5809:
---------------------------------

Please execute my poor English. : )
I found that int BlockPoolSliceScanner, blockInfoSet can contain two block which has the same
block id, because BlockScanInfo compare by lastScanTime. 
Then int method updateScanStatus, the BlockScanInfo can not be updated, so ((now - getEarliestScanTime())
>= scanPeriod) will be always true. 
This cause datanode drop into infinite loop. 

> BlockPoolSliceScanner make datanode to drop into infinite loop
> --------------------------------------------------------------
>
>                 Key: HDFS-5809
>                 URL: https://issues.apache.org/jira/browse/HDFS-5809
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.0.0-alpha
>         Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0
>            Reporter: ikweesung
>            Priority: Critical
>              Labels: blockpoolslicescanner, datanode, infinite-loop
>
> Hello, everyone.
> When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks in my cluster.
> Then, randomly one datanode drop into infinite loop as the log show, and finally all
datanodes drop into infinite loop.
> Every datanode just verify fail by one block. 
> When i check the fail block like this : hadoop fsck / -files -blocks | grep blk_1223474551535936089_4702249,
no hdfs file contains the block.
> It seems that in while block of BlockPoolSliceScanner's scan method drop into infinite
loop .
> BlockPoolSliceScanner: 650
> while (datanode.shouldRun
> && !datanode.blockScanner.blockScannerThread.isInterrupted()
> && datanode.isBPServiceAlive(blockPoolId)) { ....
> The log finally printed in method verifyBlock(BlockPoolSliceScanner:453).
> Please excuse my poor English.
> -------------------------------------------------------------------------------------------------------------------------------------------------
> LOG: 
> 2014-01-21 18:36:50,582 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Verification failed for BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634
- may be due to race with write
> 2014-01-21 18:36:50,582 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Verification failed for BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634
- may be due to race with write
> 2014-01-21 18:36:50,582 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Verification failed for BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634
- may be due to race with write



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message