hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sandeep (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1228) CRC does not match when retrying appending a partial block
Date Thu, 24 Mar 2011 05:36:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010564#comment-13010564
] 

sandeep commented on HDFS-1228:
-------------------------------

Please check the scenario where CRC Comparision fails:
======================================================
1)create a file of 512bytes 2)now try appending some more content to the file
3)just append 2 bytes and call sync  4)Again append 2 more bytes and call syn
But this time sync will fail throwing this exception

2011-03-12 20:28:37,671 ERROR datanode.DataNode (DataXceiver.java:run(131)) - DatanodeRegistration(10.18.52.116:50010,
storageID=DS-1547254589-10.18.52.116-50010-1299941311942, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Partial CRC 3835263025 does not match value computed the  last time file
was closed 2082103828
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.computePartialChunkCrc(BlockReceiver.java:692)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.setBlockPosition(BlockReceiver.java:632)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:400)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:533)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:358)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
	at java.lang.Thread.run(Thread.java:619)


> CRC does not match when retrying appending a partial block
> ----------------------------------------------------------
>
>                 Key: HDFS-1228
>                 URL: https://issues.apache.org/jira/browse/HDFS-1228
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20-append
>            Reporter: Thanh Do
>
> - Summary: when appending to partial block, if is possible that
> retrial when facing an exception fails due to a checksum mismatch.
> Append operation is not atomic (either complete or fail completely).
>  
> - Setup:
> + # available datanodes = 2
> +# disks / datanode = 1
> + # failures = 1
> + failure type = bad disk
> + When/where failure happens = (see below)
>  
> - Details:
> Client writes 16 bytes to dn1 and dn2. Write completes. So far so good.
> The meta file now contains: 7 bytes header + 4 byte checksum (CK1 -
> checksum for 16 byte) Client then appends 16 bytes more, and let assume there is an
> exception at BlockReceiver.receivePacket() at dn2. So the client knows dn2
> is bad. BUT, the append at dn1 is complete (i.e the data portion and checksum portion
> has been made to disk to the corresponding block file and meta file), meaning that the
> checksum file at dn1 now contains 7 bytes header + 4 byte checksum (CK2 - this is
> checksum for 32 byte data). Because dn2 has an exception, client calls recoverBlock and
> starts append again to dn1. dn1 receives 16 byte data, it verifies if the pre-computed
> crc (CK2) matches what we recalculate just now (CK1), which obviously does not match.
> Hence an exception and retrial fails.
>  
> - a similar bug has been reported at
> https://issues.apache.org/jira/browse/HDFS-679
> but here, it manifests in different context.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and 
> Haryadi Gunawi (haryadi@eecs.berkeley.edu)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message