Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 57147 invoked from network); 11 Jul 2008 08:09:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Jul 2008 08:09:53 -0000 Received: (qmail 75695 invoked by uid 500); 11 Jul 2008 08:09:53 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 75661 invoked by uid 500); 11 Jul 2008 08:09:52 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 75649 invoked by uid 99); 11 Jul 2008 08:09:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jul 2008 01:09:52 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jul 2008 08:09:09 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id EAF21234C166 for ; Fri, 11 Jul 2008 01:09:31 -0700 (PDT) Message-ID: <201276873.1215763771961.JavaMail.jira@brutus> Date: Fri, 11 Jul 2008 01:09:31 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image In-Reply-To: <693390643.1215569311620.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612794#action_12612794 ] dhruba borthakur commented on HADOOP-3724: ------------------------------------------ I extracted the ascii strings from the fsimage, edits and edits.new. The file /foo/bar/jambajuice" apears in the fsimage as a regular file. It also appears in the fsimage as a saved lease. Both of them are valid entries. Then I took fsimage/edits/... into a existing namenode, and started namenode. Namenode started without a hitch. It processed the contents of all three files. This test was done with 0.18 On further inspection of the customer install (with help from Lohit), we found that this was not running hadoop 0.18 release. Rather, it was running a much earlier version of the software. We verified that the workspace from which that buggy version of hadoop was build did not have the latest fixes in LeaseManager.java and FSNamesystem.java. Thanks to Lohit for all his hard work and time. Possible fixes that have gone in to solve this type of problem: HADOOP-3269 HADOOP-3349 HADOOP-3375 HADOOP-3418 > Namenode does not start due to exception throw while saving Image > ----------------------------------------------------------------- > > Key: HADOOP-3724 > URL: https://issues.apache.org/jira/browse/HADOOP-3724 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.18.0 > Reporter: Lohit Vijayarenu > Assignee: dhruba borthakur > Priority: Blocker > Fix For: 0.18.0 > > > Re-start of namenode failed with this stack trace while savingImage during initialization > {noformat} > 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000 > 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace. > at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376) > at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874) > at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892) > at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81) > at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273) > at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:252) > at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148) > at org.apache.hadoop.dfs.NameNode.(NameNode.java:193) > at org.apache.hadoop.dfs.NameNode.(NameNode.java:179) > at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822) > at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831) > {noformat} > Looks like it was throwing IOException in saveFilesUnderConstruction > Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this > {noformat} > 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice > 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice file does not exist. > {noformat} > These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G. > Could not find any other information about the file as there were not previous namenode logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.