hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsuyoshi OZAWA <ozawa.tsuyo...@gmail.com>
Subject Re: QJM cost 5 minutes for failover
Date Wed, 03 Dec 2014 22:36:22 GMT
Hi,

Are you using ZKFC? If the answer is positive, could you share the
configuration files for ZooKeeper?

Thanks,
- Tsuyoshi

On Wed, Dec 3, 2014 at 2:15 AM, mail list <louis.hust.ml@gmail.com> wrote:
> Hi,all
>
> we are testing QJM NAMENODE HA, and when the active name node down,it cost about 5 mins
to work normal.
>
> Log on the standby name node as below:
>
> {log}
> 014-12-03 15:55:51,307 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will
take over writing edit logs at txnid 6797
> 2014-12-03 15:55:51,313 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting
log segment at 6797
> 2014-12-03 15:55:51,373 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number
of transactions: 1 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
0 Number of syncs: 0 SyncTimes(ms): 0 9
> 2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Starting CacheReplicationMonitor with interval 30000 milliseconds
> 2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning because of pending operations
> 2014-12-03 15:55:51,678 INFO org.apache.hadoop.fs.TrashPolicyDefault: Namenode trash
configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
> 2014-12-03 15:55:51,679 INFO org.apache.hadoop.fs.TrashPolicyDefault: The configured
checkpoint interval is 0 minutes. Using an interval of 1440 minutes that is used for deletion
instead
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
Total number of blocks            = 179
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
Number of invalid blocks          = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
Number of under-replicated blocks = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
Number of  over-replicated blocks = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
Number of blocks being written    = 4
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.StateChange: STATE* Replication Queue
initialization scan for invalid, over- and under-replicated blocks completed in 386 msec
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 308 millisecond(s).
> 2014-12-03 15:56:21,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
> 2014-12-03 15:56:21,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30001 milliseconds
> 2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
> 2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
> 2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
> 2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-12-03 15:58:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
> 2014-12-03 15:58:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30001 milliseconds
> 2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:59:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
> 2014-12-03 15:59:51,388 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 16:00:14,295 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock:
caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483.
Returning previously allocated block blk_1073743458_2634{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[]}
> 2014-12-03 16:00:14,299 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock:
caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483.
> {log}
>
> As you seen, the time from 15:55:51 to 15:59:51 is all org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor,
I have no idea
> what hadoop is doing in the five minutes? And how can i reduce the time?
>
>
>
>
>



-- 
- Tsuyoshi

Mime
View raw message