hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brahma Reddy Battula (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"
Date Tue, 06 Jun 2017 13:32:19 GMT

     [ https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brahma Reddy Battula updated HDFS-6804:
---------------------------------------
    Attachment: HDFS-6804-branch-2.8.patch

Uploading testcase for branch-2.8..Testcase will not applicable to {{trunk}} and {{branch-2}}
since transfer will fail after HDFS-10958 and  HDFS-11337. Please check trace for same.

After reverting the HDFS-11060, testcase will fail.

 *Trunk* 
{noformat}
 getBlockURI()     = file:/D:/OSCode/hadoop-trunk/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current/BP-1559881164-10.18.246.125-1496746386088/current/finalized/subdir0/subdir0/blk_1073741825
reopen failed.  Unable to move meta file  D:\OSCode\hadoop-trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\data\data1\current\BP-1559881164-10.18.246.125-1496746386088\current\finalized\subdir0\subdir0\blk_1073741825_1001.meta
to rbw dir D:\OSCode\hadoop-trunk\hadoop\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\data\data1\current\BP-1559881164-10.18.246.125-1496746386088\current\rbw\blk_1073741825_1002.meta
	at org.apache.hadoop.hdfs.server.datanode.LocalReplicaInPipeline.moveReplicaFrom(LocalReplicaInPipeline.java:388)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.append(FsVolumeImpl.java:1194)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.java:1176)

{noformat}

 *branch-2* 
{noformat}
2017-06-06 20:01:40,314 WARN  datanode.DataNode (DataNode.java:run(2488)) - DatanodeRegistration(127.0.0.1:52070,
datanodeUuid=4fd53a59-936b-4e14-836d-83c30c530c1c, infoPort=52106, infoSecurePort=0, ipcPort=52107,
storageInfo=lv=-57;cid=testClusterID;nsid=2117479843;c=1496750487571):Failed to transfer BP-974715696-10.18.246.125-1496750487571:blk_1073741825_1001
to 127.0.0.1:52121 got 
java.io.IOException: Block BP-974715696-10.18.246.125-1496750487571:blk_1073741825_1001 is
not valid. Expected block file at null does not exist.
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:810)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:417)
	at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2442)
{noformat}

[~jojochuang] Kindly review the testcase. Sorry,for delayed response.I missed this.

> race condition between transferring block and appending block causes "Unexpected checksum
mismatch exception" 
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6804
>                 URL: https://issues.apache.org/jira/browse/HDFS-6804
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.2.0
>            Reporter: Gordon Wang
>            Assignee: Brahma Reddy Battula
>         Attachments: HDFS-6804-branch-2.8.patch, Testcase_append_transfer_block.patch
>
>
> We found some error log in the datanode. like this
> {noformat}
> 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Ex
> ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
> java.io.IOException: Terminating due to a checksum error.java.io.IOException: Unexpected
checksum mismatch while writing BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
from /192.168.2.101:39495
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536)
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
>         at java.lang.Thread.run(Thread.java:744)
> {noformat}
> While on the source datanode, the log says the block is transmitted.
> {noformat}
> 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Da
> taTransfer: Transmitted BP-2072804351-192.168.2.104-1406008383435:blk_1073741997
> _9248 (numBytes=16188152) to /192.168.2.103:50010
> {noformat}
> When the destination datanode gets the checksum mismatch, it reports bad block to NameNode
and NameNode marks the replica on the source datanode as corrupt. But actually, the replica
on the source datanode is valid. Because the replica can pass the checksum verification.
> In all, the replica on the source data is wrongly marked as corrupted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message