Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 57879 invoked from network); 11 Aug 2009 16:44:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Aug 2009 16:44:21 -0000 Received: (qmail 13516 invoked by uid 500); 11 Aug 2009 16:34:53 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 13490 invoked by uid 500); 11 Aug 2009 16:34:53 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 13480 invoked by uid 99); 11 Aug 2009 16:34:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Aug 2009 16:34:53 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Aug 2009 16:34:41 +0000 Received: from [1.0.0.0] (proxy7.corp.yahoo.com [216.145.48.98]) by mrout2-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id n7BGXHFS098288 for ; Tue, 11 Aug 2009 09:33:18 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:subject: references:in-reply-to:content-type:content-transfer-encoding; b=HxG6JUFyXR9HcqpvdFz9AqCb39BcXgexZMGK8T5+nLtxivX/SWolyIrczXhr9URe Message-ID: <4A819D49.5010504@yahoo-inc.com> Date: Tue, 11 Aug 2009 09:33:13 -0700 From: Raghu Angadi User-Agent: Thunderbird 2.0.0.22 (X11/20090608) MIME-Version: 1.0 To: common-user@hadoop.apache.org Subject: Re: corrupt filesystem References: <4A809A0D.7060704@casalemedia.com> <4A80A10E.8010700@yahoo-inc.com> <4A80A9F5.9000603@casalemedia.com> In-Reply-To: <4A80A9F5.9000603@casalemedia.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Note that there are multiple log files (one for each day). Make sure you searched all the relevant days. You can also check datanode log for this block. HDFS writes to all three datanodes at the time you write the data. It is possible that other two datanodes also encountered errors. This would result in an error when you tried to copy and such corrupt block should not even appear in HDFS. Did you restart the cluster after copying? 0.18.3 has various fixes related to handling block replication correctly. Please include the complete log lines (at the end of your response), it makes it simpler to interpret. Alternately you file a JIRA and attach log files there. Raghu. Mayuran Yogarajah wrote: > Hello, > >> If you are interested, you could try to trace one of these block ids in >> NameNode log to see what happened it. We are always eager to hear about >> irrecoverable errors. Please mention hadoop version you are using. >> >> > I'm using Hadoop 0.18.3. I just checked namenode log for one of the bad > blocks. I see entries from Saturday saying: > ask 1.1.1.6:50010 to replicate blk_1697509332927954816_8724 to > datanode(s) < all other data nodes > > > I only loaded this data Saturday, and the .6 data node became full at > some point. > When data is first loaded into the cluster, does the name node send the > data to as many nodes as > it can to satisfy the replication factor, or does it send it to one node > and ask that node send it to others? > > If its the latter then its possible that the block became corrupt when I > first loaded it to .6 (since it was full), > and since it was designated to send the block to other nodes none of the > nodes would have a non-corrupt > copy. > > Raghu, please let me know what you think. > > thanks, > > M