hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kawa <kawa.a...@gmail.com>
Subject Re: NN stopped and cannot recover with error "There appears to be a gap in the edit log"
Date Wed, 27 Nov 2013 23:55:53 GMT
Maybe you can play with the "offline edits viewer". I have never run into
such an issue, this I have never been playing with "offline edits viewer"
on production datasets, but it has some options that could be perhaps
useful when troubleshooting and fixing.

[kawaa@localhost Desktop]$ hdfs oev
Usage: bin/hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE
Offline edits viewer
Parse a Hadoop edits log file INPUT_FILE and save results
....
-f,--fix-txids         Renumber the transaction IDs in the input,
                       so that there are no gaps or invalid transaction IDs.
-r,--recover           When reading binary edit logs, use recovery
                       mode.  This will give you the chance to skip
                       corrupt parts of the edit log.
....

You wrote that you have "a single node for testing", so maybe it is worth
experimenting? ;)


2013/11/15 Joshua Tu <tujunxiong@live.com>

> I am using Cloudera CDH 4, the latest version of it. I didnt remove
> anything from the shell, as I can recall that issue happened when I added
> some feature from the Cloudera Manager.
>
> Any thought?
>
>
> Best Regards,
>
> *Joshua Tu*
>
>
> ------------------------------
> From: bharathvissapragada1990@gmail.com
> Date: Fri, 15 Nov 2013 11:41:19 +0530
> Subject: Re: NN stopped and cannot recover with error "There appears to be
> a gap in the edit log"
> To: user@hadoop.apache.org
>
>
> What is your hadoop version? Did you manually delete any files from the nn
> edits dir? Do you see this gap in the file listing of edits directory too?
> Ideally all the txids appear consecutive when you do a file listing in that
> dir.
>
>
> On Fri, Nov 15, 2013 at 9:44 AM, Joshua Tu <tujunxiong@live.com> wrote:
>
> Hi there,
>
>
>
> I deployed a single node for testing, today the NN stopped and cannot
> start it with eror: There appears to be a gap in the edit log.
>
>
>
> 2013-11-14 15:00:01,431 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode
metrics system shutdown complete.
>
> 2013-11-14 15:00:01,432 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception
in namenode join
>
> java.io.IOException: There appears to be a gap in the edit log.  We expected txid 8364,
but got txid 27381.
>
>        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>
>        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:158)
>
>        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:92)
>
>        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:744)
>
>        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:660)
>
>        at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:349)
>
>        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:261)
>
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:639)
>
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:476)
>
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
>
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
>
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:613)
>
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:598)
>
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
>
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233)
>
> 2013-11-14 15:00:01,445 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
>
> 2013-11-14 15:00:01,448 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>
> /************************************************************
>
> SHUTDOWN_MSG: Shutting down NameNode at ubcdh/10.0.0.4
>
> ************************************************************/
>
>
>
> Since there is only one node so restore editlogs is not available, and *hadoop
> namenode -recover* also not fit for this situation.
>
>
>
> How can I fix this issue?
>
>
>
>
>
> *JOSHUA TU JUNXIONG*
>
> Best regards
>
>
>
>
>

Mime
View raw message