Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 7049 invoked from network); 24 Nov 2010 23:01:08 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Nov 2010 23:01:08 -0000 Received: (qmail 66372 invoked by uid 500); 24 Nov 2010 23:01:40 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 66271 invoked by uid 500); 24 Nov 2010 23:01:40 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 66262 invoked by uid 99); 24 Nov 2010 23:01:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Nov 2010 23:01:40 +0000 X-ASF-Spam-Status: No, hits=-1996.4 required=10.0 tests=ALL_TRUSTED,FS_REPLICA X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Nov 2010 23:01:39 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oAON1Jgs008921 for ; Wed, 24 Nov 2010 23:01:19 GMT Message-ID: <11291399.297191290639678997.JavaMail.jira@thor> Date: Wed, 24 Nov 2010 18:01:18 -0500 (EST) From: "Thanh Do (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1103) Replica recovery doesn't distinguish between flushed-but-corrupted last chunk and unflushed last chunk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935560#action_12935560 ] Thanh Do commented on HDFS-1103: -------------------------------- "I do not think that this is the case in 0.21 & the trunk. In our lease recovery algorithm in 0.21, If there are 2 RBWs and 1 RWR, 1 RWR is excluded from the lease recovery. In the scenario that you described, RBW B and RBW C's GS is bumped and the length of recovered two replicas is truncated to MIN( len(B), len(C)). " Hairong, can you explain to me that why RBW B and RBW C's GS are bumped up. Is that because of the lease recovery protocol? But from my understanding, from Todd description, NN lease recovery is trigger after Machine A report... > Replica recovery doesn't distinguish between flushed-but-corrupted last chunk and unflushed last chunk > ------------------------------------------------------------------------------------------------------ > > Key: HDFS-1103 > URL: https://issues.apache.org/jira/browse/HDFS-1103 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.21.0, 0.22.0 > Reporter: Todd Lipcon > Priority: Blocker > Attachments: hdfs-1103-test.txt > > > When the DN creates a replica under recovery, it calls validateIntegrity, which truncates the last checksum chunk off of a replica if it is found to be invalid. Then when the block recovery process happens, this shortened block wins over a longer replica from another node where there was no corruption. Thus, if just one of the DNs has an invalid last checksum chunk, data that has been sync()ed to other datanodes can be lost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.