bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakesh R <rake...@huawei.com>
Subject RE: Standby NN is skipping the Checkpointing ?
Date Wed, 11 Apr 2012 06:22:53 GMT
Thanks Ivan.

I am using 0.24 version. I have configured <dfs.namenode.edit.dirs> with bookie details
in both the NNs and observed no 'journalSet' is creating in the BNN side. Probably will try
either by merging HDFS-3058 or using the gitub repos.

Thanks,
Rakesh R
________________________________________
From: Ivan Kelly [ivank@apache.org]
Sent: Tuesday, April 10, 2012 7:48 PM
To: bookkeeper-user@zookeeper.apache.org
Subject: Re: Standby NN is skipping the Checkpointing ?

What is the state of the EditLogTailer thread in this case.
20:37:29,356 indicates that the log has rolled, so it should be
possible to read events from it, even if the events are just start and
end segment. However, doTailEdits() doesn't seem to be called.

Which version of the code are you using? The code to get BK working
with HA isn't in hadoop-common trunk yet. In particular, HDFS-3058
needs to be applied for it to work.

All changes already exist in
https://github.com/ivankelly/hadoop-common/tree/BKJM-benching

-Ivan

On Mon, Apr 09, 2012 at 05:22:18AM +0000, Rakesh R wrote:
> Hi All,
>
>
>
> I have been trying to setup NN-HA using BKJournal plugins. Here I have observed the Checkpointing
operations are getting skipped and not receiving the latest transactions. However on Active
failure, the Standby is able to switch to Active by reading the log from the bookies.
>
>
>
> I just wanted to improve the switching time. Anything I have missed?
>
> Is there any configurations available in the Bookie Journal side, to make 'Hot Standby'
rather than silently skipping the log streams ?
>
>
>
> Logs of Standby NN:-
>
> -------------------------
>
> 2012-04-05 20:32:29,365 INFO  ha.StandbyCheckpointer (StandbyCheckpointer.java:start(119))
- Starting standby checkpoint thread...
> Checkpointing active NN at 10.18.40.45:50070
> Serving checkpoints at HOST-10-18-40-91/10.18.40.91:50070
> 2012-04-05 20:32:58,484 INFO  hdfs.StateChange (DatanodeManager.java:registerDatanode(573))
- BLOCK* NameSystem.registerDatanode: node registration from 10.18.40.91:50010 storage DS-1584274703-10.18.40.91-50010-1333638178337
> 2012-04-05 20:32:58,487 INFO  net.NetworkTopology (NetworkTopology.java:add(354)) - Adding
a new node: /default-rack/10.18.40.91:50010
> 2012-04-05 20:32:58,557 INFO  blockmanagement.BlockManager (BlockManager.java:processReport(1439))
- BLOCK* processReport: Received first block report from 10.18.40.91:50010 after becoming
active. Its block contents are no longer considered stale.
> 2012-04-05 20:32:58,557 INFO  hdfs.StateChange (BlockManager.java:processReport(1453))
- BLOCK* processReport: from 10.18.40.91:50010, blocks: 0, processing time: 2 msecs
> 2012-04-05 20:33:05,077 INFO  hdfs.StateChange (DatanodeManager.java:registerDatanode(573))
- BLOCK* NameSystem.registerDatanode: node registration from 10.18.40.45:50010 storage DS-1120258987-10.18.40.45-50010-1333638341930
> 2012-04-05 20:33:05,078 INFO  net.NetworkTopology (NetworkTopology.java:add(354)) - Adding
a new node: /default-rack/10.18.40.45:50010
> 2012-04-05 20:33:05,185 INFO  blockmanagement.BlockManager (BlockManager.java:processReport(1439))
- BLOCK* processReport: Received first block report from 10.18.40.45:50010 after becoming
active. Its block contents are no longer considered stale.
> 2012-04-05 20:33:05,185 INFO  hdfs.StateChange (BlockManager.java:processReport(1453))
- BLOCK* processReport: from 10.18.40.45:50010, blocks: 0, processing time: 0 msecs
> 2012-04-05 20:37:29,356 INFO  ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(263))
- Triggering log roll on remote NameNode /10.18.40.45:8020
> 2012-04-05 20:38:37,614 INFO  hdfs.StateChange (BlockManager.java:processReport(1453))
- BLOCK* processReport: from 10.18.40.91:50010, blocks: 0, processing time: 0 msecs
> 2012-04-05 20:39:14,209 INFO  hdfs.StateChange (BlockManager.java:processReport(1453))
- BLOCK* processReport: from 10.18.40.45:50010, blocks: 0, processing time: 0 msecs
> 2012-04-05 20:42:29,368 INFO  ha.StandbyCheckpointer (StandbyCheckpointer.java:doWork(270))
- Triggering checkpoint because it has been 600 seconds since the last checkpoint, which exceeds
the configured interval 600
> 2012-04-05 20:42:29,368 INFO  ha.StandbyCheckpointer (StandbyCheckpointer.java:doCheckpoint(151))
- A checkpoint was triggered but the Standby Node has not received any transactions since
the last checkpoint at txid 0. Skipping...
>
>
>
> Thanks & Regards,
>
> Rakesh R

Mime
View raw message