hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravihad...@gmail.com>
Subject Re: why the default value of 'yarn.resourcemanager.container.liveness-monitor.interval-ms' in yarn-default.xml is so high?
Date Wed, 02 Nov 2016 22:41:43 GMT
Hi Tanvir!

Its hard to have some configuration that works for all cluster scenarios. I
suspect that value was chosen as somewhat a mirror of the time it takes
HDFS to realize a datanode is dead (which is also 10 mins from what I
remember). The RM also has to reschedule the work when that timeout
expires. Also there may be network glitches which could last that
long...... Also, the NMs are pretty stable by themselves. Failing NMs have
not been too common in my experience.


On Wed, Nov 2, 2016 at 10:44 AM, Tanvir Rahman <tanvir9982000@gmail.com>

> Hello,
> Can anyone please tell me why the default value of '
> yarn.resourcemanager.container.liveness-monitor.interval-ms' in
> yarn-default.xml
> <https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml>
> so high? This parameter determines "How often to check that containers
> are still alive". The default value is 60000 ms or 10 minutes. So if a
> node manager fails, the resource manager detects the dead container after
> 10 minutes.
> I am running a wordcount code in my university cluster. In the middle of
> run, I stopped node manager of one node (the data node is still running)
> and found that the completion time increases about 10 minutes because of
> the node manager failure.
> Thanks in advance
> Tanvir

View raw message