hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thanh Do (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1228) CRC does not match when retrying appending a partial block
Date Thu, 17 Jun 2010 04:24:24 GMT
CRC does not match when retrying appending a partial block
----------------------------------------------------------

                 Key: HDFS-1228
                 URL: https://issues.apache.org/jira/browse/HDFS-1228
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: data-node
    Affects Versions: 0.20.1
            Reporter: Thanh Do


- Summary: when appending to partial block, if is possible that
retrial when facing an exception fails due to a checksum mismatch.
Append operation is not atomic (either complete or fail completely).
 
- Setup:
+ # available datanodes = 2
+# disks / datanode = 1
+ # failures = 1
+ failure type = bad disk
+ When/where failure happens = (see below)
 
- Details:
Client writes 16 bytes to dn1 and dn2. Write completes. So far so good.
The meta file now contains: 7 bytes header + 4 byte checksum (CK1 -
checksum for 16 byte) Client then appends 16 bytes more, and let assume there is an
exception at BlockReceiver.receivePacket() at dn2. So the client knows dn2
is bad. BUT, the append at dn1 is complete (i.e the data portion and checksum portion
has been made to disk to the corresponding block file and meta file), meaning that the
checksum file at dn1 now contains 7 bytes header + 4 byte checksum (CK2 - this is
checksum for 32 byte data). Because dn2 has an exception, client calls recoverBlock and
starts append again to dn1. dn1 receives 16 byte data, it verifies if the pre-computed
crc (CK2) matches what we recalculate just now (CK1), which obviously does not match.
Hence an exception and retrial fails.
 
- a similar bug has been reported at
https://issues.apache.org/jira/browse/HDFS-679
but here, it manifests in different context.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and 
Haryadi Gunawi (haryadi@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message