Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 95989 invoked from network); 4 Mar 2008 22:10:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Mar 2008 22:10:54 -0000 Received: (qmail 30786 invoked by uid 500); 4 Mar 2008 22:10:49 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 30760 invoked by uid 500); 4 Mar 2008 22:10:49 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 30751 invoked by uid 99); 4 Mar 2008 22:10:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Mar 2008 14:10:49 -0800 X-ASF-Spam-Status: No, hits=-1998.8 required=10.0 tests=ALL_TRUSTED,FS_REPLICA X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Mar 2008 22:10:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BA70B234C093 for ; Tue, 4 Mar 2008 14:09:40 -0800 (PST) Message-ID: <456332420.1204668580762.JavaMail.jira@brutus> Date: Tue, 4 Mar 2008 14:09:40 -0800 (PST) From: "lohit vijayarenu (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2065) Replication policy for corrupted block MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575164#action_12575164 ] lohit vijayarenu commented on HADOOP-2065: ------------------------------------------ Talked to Raghu regarding this. bq For (2), it'll be nice if the namenode can delete the corrupted block if there's a good replica on other nodes. Right now, if there are good replicas, then namenode does replicate the good blocks after it times out while trying to replicate corrupt block. This was fixed by HADOOP-2012 , but if all replicas were corrupt, then namenodes keeps on trying. It was decided that, this is desired behavior because one should find out such corruptions. bq For (3), I prefer if the namenode can still replicate the block. bq To make the matters worse, if the corrupted file is accessed, all the corrupted replicas would be deleted except for one and stay as replication factor of 1 forever. With the current policy, if all blocks are corrupted, namenode would delete 2 of them and since it fails to replicate, it keep on trying as mentioned in HADOOP-2012 Now, do we want that single replica to be replicated? In that case it is similar to namenode not looping while replicating. > Replication policy for corrupted block > --------------------------------------- > > Key: HADOOP-2065 > URL: https://issues.apache.org/jira/browse/HADOOP-2065 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.14.1 > Reporter: Koji Noguchi > Assignee: lohit vijayarenu > Fix For: 0.17.0 > > > Thanks to HADOOP-1955, even if one of the replica is corrupted, the block should get replicated from a good replica relatively fast. > Created this ticket to continue the discussion from http://issues.apache.org/jira/browse/HADOOP-1955#action_12531162. > bq. 2. Delete corrupted source replica > bq. 3. If all replicas are corrupt, stop replication. > For (2), it'll be nice if the namenode can delete the corrupted block if there's a good replica on other nodes. > For (3), I prefer if the namenode can still replicate the block. > Before 0.14, if the file was corrupted, users were still able to pull the data and decide if they want to delete those files. (HADOOP-2063) > In 0.14 and later, we cannot/don't replicate these blocks so they eventually get lost. > To make the matters worse, if the corrupted file is accessed, all the corrupted replicas would be deleted except for one and stay as replication factor of 1 forever. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.