hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anu Engineer <aengin...@hortonworks.com>
Subject Re: HDFS HA(Based on QJM) Failover Frequently with Large FSimage and Busy Requests
Date Wed, 26 Apr 2017 22:41:43 GMT
Hi yizhou,

>Is there any releationship between high disk io and zkfc monitor request timeout?

It is difficult to answer without seeing the logs, so please take my answer with a grin of
salt.  You might want to look for GC pauses or gaps in your log – That is, JVM completely
pausing due to very high I/O.

Here is a blog from linkedin discussing this issue in detail.

https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic

Thank you for the sar output, the latencies do not look bad, was this captured while checkpoint
I/O was in progress?

One way to troubleshoot this would be to look at the GC logs (any GC would do; we don’t
need full GCs). What you want to look for is the GC user time plus GC sys time. Typically,
this will be less than the GC real time, if it is not, it could indicate CPU starvation. You
can correlate that to download times and see if there is any relation.

Hope this helps.

Thanks
Anu





From: "gu.yizhou@zte.com.cn" <gu.yizhou@zte.com.cn>
Date: Wednesday, April 26, 2017 at 1:00 AM
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: HDFS HA(Based on QJM) Failover Frequently with Large FSimage and Busy Requests


Hi All,

    HDFS HA (Based on QJM) , 5 journalnodes, Apache 2.5.0 on Redhat 6.5 with JDK1.7.

    Put 1P+ data into HDFS with FSimage about 10G, then keep on making more requests to this
HDFS, namenodes failover frequently. Wanna to know something as follows:



    1.ANN(active namenode) downloading fsimage.ckpt_* from SNN(standby namenode) leads to
very high disk io, at the same time, zkfc fails to monitor the health of ann due to timeout.
Is there any releationship between high disk io and zkfc monitor request timeout? Every failover
happened when ckpt download, but not every ckpt download leads to failover.



[cid:image001.png@01D2BEA3.9A7DD450]



2017-03-15 09:27:05,750 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception
trying to monitor health of NameNode at nn1/ip:8020: Call From nn1/ip to nn1:8020 failed on
socket timeout exception: java.net.SocketTimeoutException: 45000 millis timeout while waiting
for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/ip:48536
remote=nn1/ip:8020]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout

2017-03-15 09:27:05,750 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_NOT_RESPONDING



    2.Due to SERVICE_NOT_RESPONDING, another zkfc fences the old ann(configed sshfence), before
restart by my additional monitor, old ann log sometimes shows like this, what is "Rescan of
postponedMisreplicatedBlocks"? Does this have any reletionships with failover?

2017-03-15 04:36:00,866 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds

2017-03-15 04:36:00,931 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Scanned 0 directive(s) and 0 block(s) in 65 millisecond(s).

2017-03-15 04:36:01,127 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan
of postponedMisreplicatedBlocks completed in 23 msecs. 247361 blocks are left. 0 blocks are
removed.

2017-03-15 04:36:04,145 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan
of postponedMisreplicatedBlocks completed in 17 msecs. 247361 blocks are left. 0 blocks are
removed.

2017-03-15 04:36:07,159 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan
of postponedMisreplicatedBlocks completed in 14 msecs. 247361 blocks are left. 0 blocks are
removed.

2017-03-15 04:36:10,173 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan
of postponedMisreplicatedBlocks completed in 14 msecs. 247361 blocks are left. 0 blocks are
removed.

2017-03-15 04:36:13,188 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan
of postponedMisreplicatedBlocks completed in 14 msecs. 247361 blocks are left. 0 blocks are
removed.

2017-03-15 04:36:16,211 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan
of postponedMisreplicatedBlocks completed in 23 msecs. 247361 blocks are left. 0 blocks are
removed.

2017-03-15 04:36:19,234 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Rescan
of postponedMisreplicatedBlocks completed in 22 msecs. 247361 blocks are left. 0 blocks are
removed.

2017-03-15 04:36:28,994 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:



    3.I config two dfs.namenode.name.dir and one dfs.journalnode.edits.dir(which shares one
disk with nn), is it suitable? Or does this have any disadvantage?



<property>

<name>dfs.namenode.name.dir.nameservice.nn1</name>

<value>/data1/hdfs/dfs/name,/data2/hdfs/dfs/name</value>

</property>

<property>

<name>dfs.namenode.name.dir.nameservice.nn2</name>

<value>/data1/hdfs/dfs/name,/data2/hdfs/dfs/name</value>

</property>



<property>

<name>dfs.journalnode.edits.dir</name>

<value>/data1/hdfs/dfs/journal</value>

</property>



    4.Interested in design of checkpoint and edit logs transmission,any explanation,issues
or documents?



Thanks in advance,

Doris
Mime
View raw message