hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"
Date Wed, 02 Nov 2016 18:44:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630020#comment-15630020
] 

Wei-Chiu Chuang commented on HDFS-6804:
---------------------------------------

I am pretty sure this is the same bug described in HDFS-11056 where I have a fix pending review.

> race condition between transferring block and appending block causes "Unexpected checksum
mismatch exception" 
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6804
>                 URL: https://issues.apache.org/jira/browse/HDFS-6804
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.2.0
>            Reporter: Gordon Wang
>            Assignee: Wei-Chiu Chuang
>
> We found some error log in the datanode. like this
> {noformat}
> 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Ex
> ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
> java.io.IOException: Terminating due to a checksum error.java.io.IOException: Unexpected
checksum mismatch while writing BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
from /192.168.2.101:39495
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536)
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
>         at java.lang.Thread.run(Thread.java:744)
> {noformat}
> While on the source datanode, the log says the block is transmitted.
> {noformat}
> 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Da
> taTransfer: Transmitted BP-2072804351-192.168.2.104-1406008383435:blk_1073741997
> _9248 (numBytes=16188152) to /192.168.2.103:50010
> {noformat}
> When the destination datanode gets the checksum mismatch, it reports bad block to NameNode
and NameNode marks the replica on the source datanode as corrupt. But actually, the replica
on the source datanode is valid. Because the replica can pass the checksum verification.
> In all, the replica on the source data is wrongly marked as corrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message