Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 75371 invoked from network); 7 Jul 2010 12:47:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Jul 2010 12:47:59 -0000 Received: (qmail 34377 invoked by uid 500); 7 Jul 2010 12:47:56 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 34031 invoked by uid 500); 7 Jul 2010 12:47:53 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 34018 invoked by uid 99); 7 Jul 2010 12:47:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jul 2010 12:47:53 +0000 X-ASF-Spam-Status: No, hits=2.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,PLING_QUERY,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of anders.arpteg@gmail.com designates 209.85.215.48 as permitted sender) Received: from [209.85.215.48] (HELO mail-ew0-f48.google.com) (209.85.215.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jul 2010 12:47:43 +0000 Received: by ewy10 with SMTP id 10so3220498ewy.35 for ; Wed, 07 Jul 2010 05:46:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:sender:reply-to:received :from:date:x-google-sender-auth:message-id:subject:to:content-type; bh=Hd/RlImB1fQNku/6ASXMjj+r/ZV/v7u4eM9UDEpTqr4=; b=GVCWrieA/zKLtHnQn+TfCtWN5vP0OxJyoYsNl1A5b9Q9wO97+tWgMkhdcnEjsD7ZbM MoKxURdmNz988t2p+2MbadGYAMROAkAF3DmoBzopehZLVar8dt2yTO3aRg12lzkalmA+ sI/0xhEK5QROvbnMLXMwWuJUD8Of00jemXuas= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:reply-to:from:date:x-google-sender-auth :message-id:subject:to:content-type; b=XeuV8dXdRm5VC7P3wSGfE+8wD+El2j1irRdmsbOEA0pEfWnU1f6FSh7X9e3hNaVg3D dtc9lSgz4PG4UucnC+SM5wi8Bc78uUhzisRfHI73ZBv60vdHw00GRO/dMiI2em1Ta6XG yq2aHDRwcMX/VRdLbH4PhefmmaPMCtghIG0dY= Received: by 10.213.28.72 with SMTP id l8mr5495236ebc.54.1278506794277; Wed, 07 Jul 2010 05:46:34 -0700 (PDT) MIME-Version: 1.0 Sender: anders.arpteg@gmail.com Reply-To: peter@bugsoft.nu Received: by 10.213.19.15 with HTTP; Wed, 7 Jul 2010 05:46:14 -0700 (PDT) From: Peter Falk Date: Wed, 7 Jul 2010 14:46:14 +0200 X-Google-Sender-Auth: kYaXlBMPAc2NVeOZ_rtJ8lw4Q2Q Message-ID: Subject: Please help! Corrupt fsimage? To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0015174c18c450dbc4048acb8fd9 X-Virus-Checked: Checked by ClamAV on apache.org --0015174c18c450dbc4048acb8fd9 Content-Type: text/plain; charset=ISO-8859-1 Hi, After a restart of our live cluster today, the name node fails to start with the log message seen below. There is a big file called edits.new in the "current" folder that seems be the only one that have received changes recently (no changes to the edits or the fsimage for over a month). Is that normal? The last change to the edits.new file was right before shutting down the cluster. It seems like the shutdown was unable to store valid fsimage, edits, edits.new files. The secondary name node image does not include the edits.new file, only edits and fsimage, which are identical to the name nodes version. So no help from them. Would appreciate any help in understanding what could have gone wrong. The shutdown seemed to complete just fine, without any error message. Is there any way to recreate the image from the data, or any other way to save our production data? Sincerely, Peter 2010-07-07 14:30:26,949 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2010-07-07 14:30:26,960 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2010-07-07 14:30:27,019 DEBUG org.apache.hadoop.security.UserGroupInformation: Unix Login: hbase,hbase 2010-07-07 14:30:27,149 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:298) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2010-07-07 14:30:27,150 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000 2010-07-07 14:30:27,151 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:298) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965 --0015174c18c450dbc4048acb8fd9--