hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liyin Liang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
Date Mon, 30 Aug 2010 06:48:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904091#action_12904091

Liyin Liang commented on MAPREDUCE-1247:

Hi Guanyin, our product cluster met the same problem. Would you please attach your patch file?

> Send out-of-band heartbeat to avoid fake lost tasktracker
> ---------------------------------------------------------
>                 Key: MAPREDUCE-1247
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: ZhuGuanyin
>            Assignee: ZhuGuanyin
> Currently the TaskTracker report task status to jobtracker through heartbeat, sometimes
if the tasktracker  lock the tasktracker to do some cleanup  job, like remove task temp data
on disk, the heartbeat thread would hang for a long time while waiting for the lock, so the
jobtracker just thought it had lost and would reschedule all its finished maps or un finished
reduce on other tasktrackers, we call it "fake lost tasktracker", some times it doesn't acceptable
especially when we run some large jobs.  So We introduce a out-of-band heartbeat mechanism
to send an out-of-band heartbeat in that case.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message