Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 35238 invoked from network); 24 Mar 2011 05:36:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Mar 2011 05:36:48 -0000 Received: (qmail 64883 invoked by uid 500); 24 Mar 2011 05:36:47 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 64849 invoked by uid 500); 24 Mar 2011 05:36:47 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 64841 invoked by uid 99); 24 Mar 2011 05:36:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Mar 2011 05:36:46 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Mar 2011 05:36:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B04664B73A for ; Thu, 24 Mar 2011 05:36:05 +0000 (UTC) Date: Thu, 24 Mar 2011 05:36:05 +0000 (UTC) From: "sandeep (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <79767227.7867.1300944965718.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-1228) CRC does not match when retrying appending a partial block MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010564#comment-13010564 ] sandeep commented on HDFS-1228: ------------------------------- Please check the scenario where CRC Comparision fails: ====================================================== 1)create a file of 512bytes 2)now try appending some more content to the file 3)just append 2 bytes and call sync 4)Again append 2 more bytes and call syn But this time sync will fail throwing this exception 2011-03-12 20:28:37,671 ERROR datanode.DataNode (DataXceiver.java:run(131)) - DatanodeRegistration(10.18.52.116:50010, storageID=DS-1547254589-10.18.52.116-50010-1299941311942, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Partial CRC 3835263025 does not match value computed the last time file was closed 2082103828 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.computePartialChunkCrc(BlockReceiver.java:692) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.setBlockPosition(BlockReceiver.java:632) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:400) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:533) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:358) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) > CRC does not match when retrying appending a partial block > ---------------------------------------------------------- > > Key: HDFS-1228 > URL: https://issues.apache.org/jira/browse/HDFS-1228 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.20-append > Reporter: Thanh Do > > - Summary: when appending to partial block, if is possible that > retrial when facing an exception fails due to a checksum mismatch. > Append operation is not atomic (either complete or fail completely). > > - Setup: > + # available datanodes = 2 > +# disks / datanode = 1 > + # failures = 1 > + failure type = bad disk > + When/where failure happens = (see below) > > - Details: > Client writes 16 bytes to dn1 and dn2. Write completes. So far so good. > The meta file now contains: 7 bytes header + 4 byte checksum (CK1 - > checksum for 16 byte) Client then appends 16 bytes more, and let assume there is an > exception at BlockReceiver.receivePacket() at dn2. So the client knows dn2 > is bad. BUT, the append at dn1 is complete (i.e the data portion and checksum portion > has been made to disk to the corresponding block file and meta file), meaning that the > checksum file at dn1 now contains 7 bytes header + 4 byte checksum (CK2 - this is > checksum for 32 byte data). Because dn2 has an exception, client calls recoverBlock and > starts append again to dn1. dn1 receives 16 byte data, it verifies if the pre-computed > crc (CK2) matches what we recalculate just now (CK1), which obviously does not match. > Hence an exception and retrial fails. > > - a similar bug has been reported at > https://issues.apache.org/jira/browse/HDFS-679 > but here, it manifests in different context. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and > Haryadi Gunawi (haryadi@eecs.berkeley.edu) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira