Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 83870D728 for ; Mon, 30 Jul 2012 17:42:39 +0000 (UTC) Received: (qmail 58377 invoked by uid 500); 30 Jul 2012 17:42:38 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 58142 invoked by uid 500); 30 Jul 2012 17:42:37 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 58134 invoked by uid 99); 30 Jul 2012 17:42:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jul 2012 17:42:37 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=HS_INDEX_PARAM,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [98.139.44.126] (HELO nm12-vm0.access.bullet.mail.sp2.yahoo.com) (98.139.44.126) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 30 Jul 2012 17:42:30 +0000 Received: from [98.139.44.106] by nm12.access.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jul 2012 17:42:09 -0000 Received: from [98.139.44.73] by tm11.access.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jul 2012 17:42:09 -0000 Received: from [127.0.0.1] by omp1010.access.mail.sp2.yahoo.com with NNFMP; 30 Jul 2012 17:42:09 -0000 X-Yahoo-Newman-Id: 933133.45883.bm@omp1010.access.mail.sp2.yahoo.com Received: (qmail 70205 invoked from network); 30 Jul 2012 17:42:09 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1343670129; bh=2IpbT8dyJdBf6xxvb5BK32fhz1y5oBvySlbiTJmYEog=; h=X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=gEqhSC3A6qVOptfp49g+OURgYPI0hBhvmiLBNfKBkuBxqMtiwAO68aB6/DY/ollj1emz/PERBUcuIe39rknBtvwP4QCsC9HmIkHoh6z+PD8RnxN3ZyA6bxXD/gtCSPThJalJVohEYZ4iRwPwRDkEVx60QsnZyfJvekBBuQlsU7w= X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: 3WpmPCsVM1ne3GKQH0rU1dSBzfCYJhC71mAFS9J_hKBJTM_ cc.I86I1flArFBP68uCj27jMihftLjnqhLvykyezlwREqUHgMt7Of1s4AVit XDjmnGS2iBcU9Wzbh.b8e.xoEB3YXdmPZPVfWG.JYoK_6RCf6RwU0WAx64Lr z9QIjpcOAtqNx.APOuUUL1JNxyLjJJg2FVeaIVeSokRc.AR6bSAozUD9vxPs rMNVkpu90kJwcxUHa._xP15EEv04T1.BBXCuq__TQHIaunFNJuWbzINvXLNj Ulo_Sa1VlUh8fbdfYlRTAMhjRWGd652SOiP5CEnNTjbAhOB2_gdXM1rJzGcq HxK5tGxvtOPRTeRbSuw8ZCy1Yh3S4RqfPrGmvD8sbCtqpFhopnTss15Owm0c oJGlGnpA2T8A9qLfrsKoMbQid8cQOyrnGjlHxpgb7dgNYjN9U9hA1xPC60DI 90vmDP2yEf5RZ84imseS6pyamX4ySf.UvktcsP3sAigxY1VWLzT.cTb6q1h2 EKInPD40aC0wMe9AnRkEw22lea1VvL4WNPieXx8nUhwa4OPDNJ.74rUYouFB 1hzUBQ.PJuf6Hwvflsgpv3T.Cdu9jw1yeF2C0X3VkOUs_MwpDGu_gKJtXDIL dqM5Gyggtm9c- X-Yahoo-SMTP: 68bMErGswBAa9XzSBkUxttAdM4jsrIw0x3I- Received: from [192.168.1.71] (tmartin@99.120.99.189 with plain) by smtp101.sbc.mail.gq1.yahoo.com with SMTP; 30 Jul 2012 10:42:09 -0700 PDT Message-ID: <5016C75E.3020603@physics.ucsd.edu> Date: Mon, 30 Jul 2012 10:41:50 -0700 From: Terrence Martin User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120713 Thunderbird/14.0 MIME-Version: 1.0 To: general@hadoop.apache.org Subject: Re: Fixing a corrupt edits file? References: <5016C56F.7000600@physics.ucsd.edu> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org You do not fix the edits file. :) When this exact issue has occurred here I have had to revert to my SNN copy of the hadoop database. For us it is not too bad as at most the lost time is around 30 minutes or less. The reason is we run our merges from the SNN pretty frequently. Terrence On 7/30/2012 10:35 AM, mouradk wrote: > Hi Terrence, > > Thanks for your reply. How do I go about fixing the edits file in the NameNode. Your help is much appreciated!! > > Thanks > > Mourad > > Mouradk > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > On Monday, 30 July 2012 at 18:33, Terrence Martin wrote: > >> The purpose for the secondary name node is to assist in the merging of >> the edits file (and an edits.new if it exists) into the main hadoop >> file. The reason the edits file is 0 on the SNN is that is because that >> is the proper state after the edits file has been merged with the main >> database file. >> >> In other words an empty edits file on the SNN is what you want. >> >> Terrence >> >> >> On 7/30/2012 10:29 AM, mouradk wrote: >>> Hello all, >>> >>> I have just had a problem with a NameNode restart and someone on the mailing list kindly suggested that the edits file was corrupted. I have made a backup copy of the file and checked my /namesecondary/previous.checkpoint but the edits file there is empty 4kb with ????? inside. >>> >>> This suggest to me that I cannot recover from the secondaryNameNode? How do you fix this problem? >>> >>> Thanks for your help. >>> >>> Original error log: >>> TARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 >>> ************************************************************/ >>> 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001 >>> 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001 >>> 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null >>> 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext >>> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop >>> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup >>> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false >>> 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext >>> 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean >>> 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533 >>> 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2 >>> 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds. >>> 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506" >>> at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) >>> at java.lang.Long.parseLong(Long.java:419) >>> at java.lang.Long.parseLong(Long.java:468) >>> at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273) >>> at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775) >>> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992) >>> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812) >>> at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) >>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:292) >>> at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) >>> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:279) >>> at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) >>> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) >>> >>> 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: >>> >>> >>> >>> Mouradk >