Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 37914 invoked from network); 13 Nov 2008 12:30:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Nov 2008 12:30:28 -0000 Received: (qmail 30808 invoked by uid 500); 13 Nov 2008 12:30:28 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 29799 invoked by uid 500); 13 Nov 2008 12:30:26 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 29730 invoked by uid 99); 13 Nov 2008 12:30:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Nov 2008 04:30:26 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ankur.goel@corp.aol.com designates 64.236.137.26 as permitted sender) Received: from [64.236.137.26] (HELO r2d2.nscp.aoltw.net) (64.236.137.26) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Nov 2008 12:29:07 +0000 Received: from AOLMTCMEH01.ad.office.aol.com (aolmtcmeh01.office.aol.com [10.178.121.20]) by r2d2.nscp.aoltw.net (8.10.0/8.10.0) with ESMTP id mADCTnn00579; Thu, 13 Nov 2008 04:29:49 -0800 (PST) Received: from AOLMTCMEI01.ad.office.aol.com ([10.178.3.18]) by AOLMTCMEH01.ad.office.aol.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 13 Nov 2008 07:29:48 -0500 Received: from agoel-pc.office.aol.com ([10.178.3.10]) by AOLMTCMEI01.ad.office.aol.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Thu, 13 Nov 2008 07:29:48 -0500 Message-ID: <491C1EC7.9090009@corp.aol.com> Date: Thu, 13 Nov 2008 18:04:15 +0530 From: ANKUR GOEL User-Agent: Thunderbird 2.0.0.17 (X11/20080914) MIME-Version: 1.0 To: core-dev@hadoop.apache.org, core-user@hadoop.apache.org Subject: Namenode Failure Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 13 Nov 2008 12:29:48.0665 (UTC) FILETIME=[84BEFE90:01C9458B] X-Virus-Checked: Checked by ClamAV on apache.org Hi Folks, We have been running hadoop-0.17.2 release on a 50 machine cluster and we recently experience a namenode failure because of disk becoming full. The node is unable to start-up now and throws the following exception 2008-11-13 06:41:18,618 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000 2008-11-13 06:41:18,619 ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90) at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:599) at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766) at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640) at org.apache.hadoop.dfs.FSImage.doUpgrade(FSImage.java:250) at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:217) at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80) at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274) at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:255) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133) What is the best way to recover this failure with minimal data loss ? I could not find instructions on wiki or anywhere else for release 0.17.2 to do recovery using files from secondary namenode ? Any help is greatly appreciated. Thanks -Ankur