Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 72152 invoked from network); 26 Mar 2009 16:45:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Mar 2009 16:45:09 -0000 Received: (qmail 61422 invoked by uid 500); 26 Mar 2009 16:41:52 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 61377 invoked by uid 500); 26 Mar 2009 16:41:52 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 61356 invoked by uid 99); 26 Mar 2009 16:41:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Mar 2009 16:41:52 +0000 X-ASF-Spam-Status: No, hits=4.6 required=10.0 tests=FS_REPLICA,HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.132.243] (HELO an-out-0708.google.com) (209.85.132.243) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Mar 2009 16:41:43 +0000 Received: by an-out-0708.google.com with SMTP id c38so388296ana.29 for ; Thu, 26 Mar 2009 09:41:18 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <8373996d0903260521x738d174amd918609453c0bebe@mail.gmail.com> References: <8373996d0903260521x738d174amd918609453c0bebe@mail.gmail.com> Date: Thu, 26 Mar 2009 17:41:02 +0100 Received: by 10.100.249.10 with SMTP id w10mr875057anh.3.1238085678580; Thu, 26 Mar 2009 09:41:18 -0700 (PDT) Message-ID: Subject: Re: corrupt unreplicated block in dfs (0.18.3) From: Aaron Kimball To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016368e1d2212d90f046608498c X-Virus-Checked: Checked by ClamAV on apache.org --0016368e1d2212d90f046608498c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Just because a block is corrupt doesn't mean the entire file is corrupt. Furthermore, the presence/absence of a file in the namespace is a completely separate issue to the data in the file. I think it would be a surprising interface change if files suddenly disappeared just because 1 out of potentially many blocks were corrupt. - Aaron On Thu, Mar 26, 2009 at 1:21 PM, Mike Andrews wrote: > i noticed that when a file with no replication (i.e., replication=1) > develops a corrupt block, hadoop takes no action aside from the > datanode throwing an exception to the client trying to read the file. > i manually corrupted a block in order to observe this. > > obviously, with replication=1 its impossible to fix the block, but i > thought perhaps hadoop would take some other action, such as deleting > the file outright, or moving it to a "corrupt" directory, or marking > it or keeping track of it somehow to note that there's un-fixable > corruption in the filesystem? thus, the current behaviour seems to > sweep the corruption under the rug and allows its continued existence, > aside from notifying the specific client doing the read with an > exception. > > if anyone has any information about this issue or how to work around > it, please let me know. > > on the other hand, i tested that corrupting a block in a replication=3 > file causes hadoop to re-replicate the block from another existing > copy, which is good and is i what i expected. > > best, > mike > > > -- > permanent contact information at http://mikerandrews.com > --0016368e1d2212d90f046608498c--