Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 1281 invoked from network); 8 Jan 2008 18:07:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Jan 2008 18:07:38 -0000 Received: (qmail 39568 invoked by uid 500); 8 Jan 2008 18:07:24 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 39543 invoked by uid 500); 8 Jan 2008 18:07:24 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 39534 invoked by uid 99); 8 Jan 2008 18:07:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2008 10:07:24 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2008 18:06:58 +0000 Received: from SNV-EXBH01.ds.corp.yahoo.com (snv-exbh01.ds.corp.yahoo.com [207.126.227.249]) by mrout1.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id m08I66qZ044160 for ; Tue, 8 Jan 2008 10:06:06 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:x-mimeole:content-class:mime-version: content-type:content-transfer-encoding:subject:date:message-id: in-reply-to:x-ms-has-attach:x-ms-tnef-correlator:thread-topic: thread-index:references:from:to:return-path:x-originalarrivaltime; b=akpUmTepIkEnfnUeNmyvYykmHWTmY1hsJhdtabwLQXosI77GkKTfguv1McLjYCoe Received: from SNV-EXVS08.ds.corp.yahoo.com ([207.126.227.8]) by SNV-EXBH01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 8 Jan 2008 10:06:06 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Subject: RE: missing VERSION files leading to failed datanodes Date: Tue, 8 Jan 2008 10:06:05 -0800 Message-ID: <84E52AD05F6F884AAFF3344FE4C95991C4CA24@SNV-EXVS08.ds.corp.yahoo.com> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: missing VERSION files leading to failed datanodes Thread-Index: AchSFqjeYoVuxJiqRUq0HlPWqF7/WgABgbceAAEEGTA= References: From: "dhruba Borthakur" To: X-OriginalArrivalTime: 08 Jan 2008 18:06:06.0410 (UTC) FILETIME=[23900AA0:01C85221] X-Virus-Checked: Checked by ClamAV on apache.org Hi Joydeep, Which version of hadoop are you running? We had earlier fixed a bug https://issues.apache.org/jira/browse/HADOOP-2073 in version 0.15. Thanks, dhruba -----Original Message----- From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]=20 Sent: Tuesday, January 08, 2008 9:34 AM To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org Subject: RE: missing VERSION files leading to failed datanodes well - at least i know why this happened. (still looking for a way to restore the version file). https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full on one of the disks (in spite of du.reserved setting). looks like while starting up - the VERSION file could not be written and went missing. that would seem like another bug (writing a tmp file and renaming it to VERSION file would have prevented this mishap): 2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336) at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40 4) at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.util.Properties.store(Properties.java:666) at org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176) at org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164) at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510) at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java :146) at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243) -----Original Message----- From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] Sent: Tue 1/8/2008 8:51 AM To: hadoop-user@lucene.apache.org Subject: missing VERSION files leading to failed datanodes =20 2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is invalid. [root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat /var/hadoop/tmp/dfs/data/current/VERSION=20 [root@hadoop034.sf2p data]#=20 any idea why the VERSION file is empty? and how can i regenerate it - or ask the system to generate a new one without discarding all the blocks? i had previously shutdown and started dfs once (to debug a different bug where it's not honoring du.reserved. more on that later). help appreciated, Joydeep