hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3967) NN should bail our earlier when logs to load have a gap
Date Fri, 21 Sep 2012 20:41:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460809#comment-13460809
] 

Colin Patrick McCabe commented on HDFS-3967:
--------------------------------------------

Yeah, it's not a great error message.  We should probably change the code in {{RedundantEditLogInputStream.java#nextOp}}
in case {{SKIP_UNTIL}} to check {{streams[curIdx].getFirstTxId}} and print out something different
if it's too new.
                
> NN should bail our earlier when logs to load have a gap
> -------------------------------------------------------
>
>                 Key: HDFS-3967
>                 URL: https://issues.apache.org/jira/browse/HDFS-3967
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 3.0.0, 2.0.1-alpha
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> i was testing an HA setup with a lowered edit log retention period, and ended up in a
state where one of the two NNs had fallen too far behind, such that it couldn't start up again
(due to the too-low retention period). When I started the NN, I got the following:
> 12/09/21 13:03:20 INFO namenode.FSImage: Loaded image for txid 45781083 from /tmp/name1-name/current/fsimage_0000000000045781083
> 12/09/21 13:03:20 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@239a0feb
expecting start txid #45781084
> 12/09/21 13:03:20 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://localhost:13081/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b,
http://localhost:13082/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b,
http://localhost:13083/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b'
to transaction ID 45781084
> 12/09/21 13:03:20 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://localhost:13081/getJournal?jid=myjournal&segmentTxId=45928954&storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b'
to transaction ID 45781084
> 12/09/21 13:03:20 FATAL namenode.NameNode: Exception in namenode join
> java.io.IOException: There appears to be a gap in the edit log.  We expected txid 45781084,
but got txid 45928954.
> Rather than trying to 'fast forward' the stream to a transaction which is actually prior
to the first tx, we should bail earlier with a nicer error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message