hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Why does the default HEARTBEAT_INTERVAL value is 3?
Date Tue, 09 Feb 2010 19:37:51 GMT
On Tue, Feb 9, 2010 at 2:29 PM, Todd Lipcon <todd@cloudera.com> wrote:
> On a small cluster, I'm of the opinion that a value less than 3 would
> actually be useful in reducing job startup time a little bit.
>
> https://issues.apache.org/jira/browse/MAPREDUCE-1266
>
> The issue got stalled a bit. If you want it, pipe up on the JIRA :)
> Especially if you have hard data indicating this is a good idea (I
> never had the time to really prove it)
>
> -Todd
>
> On Tue, Feb 9, 2010 at 10:21 AM, E. Sammer <eric@lifeless.net> wrote:
>> On 2/9/10 11:52 AM, ChingShen wrote:
>>>
>>> Hi,
>>>
>>>  I have a question about HEARTBEAT_INTERVAL.
>>>  Why does the default HEARTBEAT_INTERVAL value is 3 rather than 2 or 1?
>>> any
>>> resources?
>>
>> Shen:
>>
>> While I don't have a good answer for why the number 3 was chosen (actually,
>> I think it's 5 seconds on heartbeats and the 3 seconds is how often a task
>> tracker thread checks if progress is being made or something like that), I
>> can tell you that there's network chatter caused by the heartbeat. You
>> wouldn't want heartbeat to be any faster as you would unnecessarily cause
>> network congestion and force the job tracker to do additional (possibly
>> unnecessary) work. As the cluster grows, the heartbeat interval is increased
>> leading to even less frequent check-ins to attempt to mitigate the
>> congestion / high concurrency on the JT.
>>
>> One of the down sides to this is that tasks aren't given to task trackers as
>> quickly as they could be, but there are probably better ways of decreasing
>> the amount of time required to hand out work rather than simply increasing
>> the heartbeat rate.
>>
>> Keep in mind that most Hadoop jobs run for long periods of time, so the
>> slight delay in handing out tasks isn't a huge problem and 3 to 5 seconds is
>> more than sufficient to know that a task tracker is alive and healthy.
>>
>> Hope this helps.
>> --
>> Eric Sammer
>> eric@lifeless.net
>> http://esammer.blogspot.com
>>
>

With the setting of 5 each tasktracker checks into the jobtracker
every 5 seconds. The concept is that with enough TaskTrackers , say a
1000 node cluster 1000/5= 200 will be checking in at and given second.
As Todd mentions, if your cluster is small say 2 nodes 2/5 = .4 nodes
will be checking in for a given second and this results in a small
delay. Configurable would be nice, then again there is already a good
amount of things to configure :)

Mime
View raw message