hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mail list <louis.hust...@gmail.com>
Subject QJM cost 5 minutes for failover
Date Wed, 03 Dec 2014 10:15:27 GMT
Hi,all 

we are testing QJM NAMENODE HA, and when the active name node down,it cost about 5 mins to
work normal.

Log on the standby name node as below:

{log}
014-12-03 15:55:51,307 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take
over writing edit logs at txnid 6797
2014-12-03 15:55:51,313 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
segment at 6797
2014-12-03 15:55:51,373 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions:
1 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of
syncs: 0 SyncTimes(ms): 0 9
2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Starting CacheReplicationMonitor with interval 30000 milliseconds
2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning because of pending operations
2014-12-03 15:55:51,678 INFO org.apache.hadoop.fs.TrashPolicyDefault: Namenode trash configuration:
Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
2014-12-03 15:55:51,679 INFO org.apache.hadoop.fs.TrashPolicyDefault: The configured checkpoint
interval is 0 minutes. Using an interval of 1440 minutes that is used for deletion instead
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total
number of blocks            = 179
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number
of invalid blocks          = 0
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number
of under-replicated blocks = 0
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number
of  over-replicated blocks = 0
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number
of blocks being written    = 4
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.StateChange: STATE* Replication Queue
initialization scan for invalid, over- and under-replicated blocks completed in 386 msec
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 308 millisecond(s).
2014-12-03 15:56:21,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-12-03 15:56:21,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30001 milliseconds
2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2014-12-03 15:58:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-12-03 15:58:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30001 milliseconds
2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 15:59:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-12-03 15:59:51,388 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 16:00:14,295 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught
retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483.
Returning previously allocated block blk_1073743458_2634{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[]}
2014-12-03 16:00:14,299 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught
retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483.

{log}

As you seen, the time from 15:55:51 to 15:59:51 is all org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor,
I have no idea
what hadoop is doing in the five minutes? And how can i reduce the time?






Mime
View raw message