hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terrence Martin <tmar...@physics.ucsd.edu>
Subject Re: Fixing a corrupt edits file?
Date Mon, 30 Jul 2012 17:41:50 GMT
You do not fix the edits file. :) When this exact issue has occurred 
here I have had to revert to my SNN copy of the hadoop database.

For us it is not too bad as at most the lost time is around 30 minutes 
or less. The reason is we run our merges from the SNN pretty frequently.

Terrence


On 7/30/2012 10:35 AM, mouradk wrote:
> Hi Terrence,
>
> Thanks for your reply. How do I go about fixing the edits file in the NameNode. Your
help is much appreciated!!
>
> Thanks
>
> Mourad
>
> Mouradk
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Monday, 30 July 2012 at 18:33, Terrence Martin wrote:
>
>> The purpose for the secondary name node is to assist in the merging of
>> the edits file (and an edits.new if it exists) into the main hadoop
>> file. The reason the edits file is 0 on the SNN is that is because that
>> is the proper state after the edits file has been merged with the main
>> database file.
>>
>> In other words an empty edits file on the SNN is what you want.
>>
>> Terrence
>>
>>
>> On 7/30/2012 10:29 AM, mouradk wrote:
>>> Hello all,
>>>
>>> I have just had a problem with a NameNode restart and someone on the mailing
list kindly suggested that the edits file was corrupted. I have made a backup copy of the
file and checked my /namesecondary/previous.checkpoint but the edits file there is empty 4kb
with ????? inside.
>>>
>>> This suggest to me that I cannot recover from the secondaryNameNode? How do you
fix this problem?
>>>
>>> Thanks for your help.
>>>
>>> Original error log:
>>> TARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20
-r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>>> ************************************************************/
>>> 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing
RPC Metrics with hostName=NameNode, port=50001
>>> 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
Namenode up at: localhost/127.0.0.1:50001
>>> 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing
JVM Metrics with processName=NameNode, sessionId=null
>>> 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
>>> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=hadoop,hadoop
>>> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
supergroup=supergroup
>>> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=false
>>> 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
>>> 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
Registered FSNamesystemStatusMBean
>>> 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number
of files = 533
>>> 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number
of files under construction = 2
>>> 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image
file of size 55400 loaded in 0 seconds.
>>> 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NumberFormatException: For input string: "1343506"
>>> at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>> at java.lang.Long.parseLong(Long.java:419)
>>> at java.lang.Long.parseLong(Long.java:468)
>>> at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
>>> at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
>>> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>>> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>>> at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>>> at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>>> at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>>> at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>>> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>>>
>>> 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
SHUTDOWN_MSG:
>>>
>>>
>>>
>>> Mouradk
>


Mime
View raw message