Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 20560 invoked from network); 8 Jan 2008 18:58:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Jan 2008 18:58:07 -0000 Received: (qmail 12704 invoked by uid 500); 8 Jan 2008 18:57:46 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 12672 invoked by uid 500); 8 Jan 2008 18:57:45 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 12641 invoked by uid 99); 8 Jan 2008 18:57:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2008 10:57:45 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2008 18:57:19 +0000 Received: from [10.72.104.183] (enoughhot-lx.corp.yahoo.com [10.72.104.183]) by mrout1.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id m08Iu2Cp071919 for ; Tue, 8 Jan 2008 10:56:02 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:x-accept-language: mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=l6FjpVxALNHP5b72dlSqeWC6IYH0uvIGZNYrxxFOqLDybo+s4d9MW5sGloaPR+6D Message-ID: <4783C742.1030005@yahoo-inc.com> Date: Tue, 08 Jan 2008 10:56:02 -0800 From: Konstantin Shvachko User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: hadoop-user@lucene.apache.org Subject: Re: missing VERSION files leading to failed datanodes References: <84E52AD05F6F884AAFF3344FE4C95991C4CA24@SNV-EXVS08.ds.corp.yahoo.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Joydeep, Do you still have the previous directory? It should be /var/hadoop/tmp/dfs/data/previous If you do you can use VERSION file from there. If not could you please do ls -r /var/hadoop/tmp/dfs/data for me, Block files are not needed of course. In any case I am interested in how it happened and why automatic recovery is not happening. Do you have any log messages from the time the data-node first failed? Was it upgrading at that time? Any information would be useful. Thank you, --Konstantin Joydeep Sen Sarma wrote: > we are running 0.14.4 > > the fix won't help me recover the current version files. all i need is the storageid. it seems to be stored in some file header somewhere. can u tell me how to get it? > > > -----Original Message----- > From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com] > Sent: Tue 1/8/2008 10:06 AM > To: hadoop-user@lucene.apache.org > Subject: RE: missing VERSION files leading to failed datanodes > > Hi Joydeep, > > Which version of hadoop are you running? We had earlier fixed a bug > https://issues.apache.org/jira/browse/HADOOP-2073 > in version 0.15. > > Thanks, > dhruba > > -----Original Message----- > From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] > Sent: Tuesday, January 08, 2008 9:34 AM > To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org > Subject: RE: missing VERSION files leading to failed datanodes > > well - at least i know why this happened. (still looking for a way to > restore the version file). > > https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full > on one of the disks (in spite of du.reserved setting). looks like while > starting up - the VERSION file could not be written and went missing. > that would seem like another bug (writing a tmp file and renaming it to > VERSION file would have prevented this mishap): > > 2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode: > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:260) > at > sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336) > at > sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40 > 4) > at > sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408) > at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152) > at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213) > at java.io.BufferedWriter.flush(BufferedWriter.java:236) > at java.util.Properties.store(Properties.java:666) > at > org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176) > at > org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164) > at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510) > at > org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java > :146) > at > org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243) > > > -----Original Message----- > From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] > Sent: Tue 1/8/2008 8:51 AM > To: hadoop-user@lucene.apache.org > Subject: missing VERSION files leading to failed datanodes > > > 2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode: > org.apache.hadoop.dfs.InconsistentFSStateException: Directory > /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is > invalid. > > [root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat > /var/hadoop/tmp/dfs/data/current/VERSION > [root@hadoop034.sf2p data]# > > any idea why the VERSION file is empty? and how can i regenerate it - or > ask the system to generate a new one without discarding all the blocks? > > > i had previously shutdown and started dfs once (to debug a different bug > where it's not honoring du.reserved. more on that later). > > help appreciated, > > Joydeep > > >