hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ZhuGuanyin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
Date Tue, 01 Dec 2009 02:37:23 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784019#action_12784019

ZhuGuanyin commented on MAPREDUCE-1247:

I agree, seperate the overtime lock method from heartbeat thread and never do i/o operations
holding locks is the best solution. We had tried, but found it's not very easy to achieved
and would not resolve recently, I propose a tempary solution. 

> Send out-of-band heartbeat to avoid fake lost tasktracker
> ---------------------------------------------------------
>                 Key: MAPREDUCE-1247
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: ZhuGuanyin
> Currently the TaskTracker report task status to jobtracker through heartbeat, sometimes
if the tasktracker  lock the tasktracker to do some cleanup  job, like remove task temp data
on disk, the heartbeat thread would hang for a long time while waiting for the lock, so the
jobtracker just thought it had lost and would reschedule all its finished maps or un finished
reduce on other tasktrackers, we call it "fake lost tasktracker", some times it doesn't acceptable
especially when we run some large jobs.  So We introduce a out-of-band heartbeat mechanism
to send an out-of-band heartbeat in that case.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message