hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Liochon <nkey...@gmail.com>
Subject Re: Time until a datanode is marked as dead
Date Mon, 26 Jan 2015 16:00:34 GMT
Note that there is a difference between being dead and being stale. stale
means "avoid as much as possible" while dead means "avoid absolutely AND
initiate a recovery, i.e. copy all the data (typically 1 or more Tb)"

There is some info on this blog entry:
http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/

Cheers,

Nicolas


On Mon, Jan 26, 2015 at 10:46 AM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> Hi Frank,
>
> can you file an issue to add this configuration to the hdfs-default.xml?
>
> On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz <frank.lanitz@sql-ag.de>
> wrote:
>
>> Hi,
>>
>> Am 23.01.2015 um 19:23 schrieb Chris Nauroth:
>> > The time period for determining if a datanode is dead is calculated as a
>> > function of a few different configuration properties.  The current
>> > implementation in DatanodeManager.java does it like this:
>> >
>> >     final long heartbeatIntervalSeconds = conf.getLong(
>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
>> >         DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT);
>> >     final int heartbeatRecheckInterval = conf.getInt(
>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY,
>> >         DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT);
>> > // 5 minutes
>> >     this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval
>> >         + 10 * 1000 * heartbeatIntervalSeconds;
>>
>>
>> Good to know.
>>
>> > Under default configuration, dfs.namenode.heartbeat.recheck-interval is
>> > 5 minutes and dfs.heartbeat.interval is 3 seconds.  If we plug those
>> > values into the formula, we get 10.5 minutes, which agrees with your
>> > observation.  If you change dfs.namenode.heartbeat.recheck-interval to
>> > 2.5 minutes, then you'll achieve an effective timeout of 5.5 minutes
>> > before a datanode is marked dead.
>> >
>> > dfs.namenode.heartbeat.recheck-interval is not documented in
>> > hdfs-default.xml, though I don't recall if that's an intentional choice
>> > or just an oversight.  The value of the property must be expressed in
>> > milliseconds.
>>
>> This did the trick. Thank you very much. For testing porpuse I've set it
>> to 10000 and after approx 45s the node was marked as dead.
>>
>> Any chance to get this into a documented preference so possible behavior
>> changes with future releases can be spotted before staging area.
>>
>> cheers,
>> Frank
>>
>
>

Mime
View raw message