hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JunYong Li <lij...@gmail.com>
Subject Re: blacklisted tasktracker metric
Date Wed, 09 May 2012 05:56:54 GMT
Failure of a tasktracker is another failure mode. If a tasktracker fails by
crashing, or
running very slowly, it will stop sending heartbeats to the jobtracker (or
send them very
infrequently). The jobtracker will notice a tasktracker that has stopped
sending heart-beats  (if  it  hasn’t  received  one  for  10  minutes,
 configured  via  the mapred.task
tracker.expiry.interval property, in milliseconds) and remove it from its
pool of
tasktrackers to schedule tasks on. The jobtracker arranges for map tasks
that were run
and completed successfully on that tasktracker to be rerun if they belong
to incomplete
jobs, since their intermediate output residing on the failed tasktracker’s
local filesystem
may not be accessible to the reduce task. Any tasks in progress are also
rescheduled.
A tasktracker can also be blacklisted by the jobtracker, even if the
tasktracker has not
failed. A tasktracker is blacklisted if the number of tasks that have
failed on it is
significantly higher than the average task failure rate on the cluster.
Blacklisted task-trackers can be restarted to remove them from the
jobtracker’s blacklist.

2012/5/9 John Stein <designersmoke@yahoo.com>

> hi,
>
> I saw a metric called blacklisted tasktrackers in the oceansync monitoring
> system, which is usually 0.  What does it mean if I saw it go up?  could
> you explain BlackListed TaskTrackers?
>
> John Stein
> Processing Engineer
> XTO Energy
>



-- 
Regards
Junyong

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message