hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Mankude (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2866) Standby does not start up due to a gap in transaction id
Date Tue, 31 Jan 2012 20:08:10 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197171#comment-13197171
] 

Hari Mankude commented on HDFS-2866:
------------------------------------

2012-01-31 19:16:15,857 WARN  ha.EditLogTailer (EditLogTailer.java:run(313)) - Edit log tailer
interrupted
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:311)

    ---------------> Failover happens and this NN is the active NN

2012-01-31 19:16:15,858 INFO  namenode.FSNamesystem (FSNamesystem.java:startActiveServices(515))
- Starting services required for active state
2012-01-31 19:16:15,860 INFO  namenode.FileJournalManager (FileJournalManager.java:recoverUnfinalizedSegments(282))
- Recovering unfinalized segments in /homes/hortonha/namenode/fsimage/current
2012-01-31 19:16:15,881 INFO  namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(94))
- Finalizing edits file /homes/hortonha/namenode/fsimage/current/edits_inprogress_0000000000000219665
-> /homes/hortonha/namenode/fsimage/current/edits_0000000000000219665-0000000000000220761
   --------------> 220761 is the last txid in the dfs.edits.dir 

2012-01-31 19:16:15,912 INFO  namenode.FileJournalManager (FileJournalManager.java:recoverUnfinalizedSegments(282))
- Recovering unfinalized segments in /homes/hortonha/namenode/current
2012-01-31 19:16:15,931 INFO  namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(94))
- Finalizing edits file /homes/hortonha/namenode/current/edits_inprogress_0000000000000219665
-> /homes/hortonha/namenode/current/edits_0000000000000219665-0000000000000220760
   ---------------> 220760 is the last txid in the shared.edits.dir

2012-01-31 19:16:15,933 INFO  namenode.FSNamesystem (FSNamesystem.java:startActiveServices(527))
- Catching up to latest edits from old active before taking over writer role in edits logs.
2012-01-31 19:16:15,939 INFO  namenode.FSImage (FSImage.java:loadEdits(682)) - Reading /homes/hortonha/namenode/fsimage/current/edits_0000000000000219665-0000000000000220761
expecting start txid #219665
2012-01-31 19:16:15,956 INFO  namenode.FSImage (FSEditLogLoader.java:loadFSEdits(90)) - Edits
file /homes/hortonha/namenode/fsimage/current/edits_0000000000000219665-0000000000000220761
of size 1048580 edits # 1097 loaded in 0 seconds.
2012-01-31 19:16:15,956 INFO  blockmanagement.BlockManager (BlockManager.java:isReplicaCorrupt(1668))
- Received an RBW replica for block blk_-5497074126999415278_65205 on 98.137.233.237:50010:
ignoring it, since the block is complete with the same generation stamp.

2012-01-31 19:16:16,465 INFO  blockmanagement.BlockManager (BlockManager.java:processMisReplicatedBlocks(1932))
- Number of blocks being written    = 3
2012-01-31 19:16:16,465 INFO  namenode.FSNamesystem (FSNamesystem.java:startActiveServices(543))
- Will take over writing edit logs at txnid 220762

-------------------> takes over at 220762 resulting in a gap in edit logs in shared directory.
Standby is stuck at this point.
2012-01-31 19:16:16,467 INFO  namenode.FSEditLog (FSEditLog.java:startLogSegment(846)) - Starting
log segment at 220762

                
> Standby does not start up due to a gap in transaction id
> --------------------------------------------------------
>
>                 Key: HDFS-2866
>                 URL: https://issues.apache.org/jira/browse/HDFS-2866
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Hari Mankude
>            Priority: Critical
>
> Standby notices a gap in the transaction id in the shared.edits directory. The transactions
in dfs.edits.dir does not seem to have the gap. The gap happens during a failover.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message