hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-181) task trackers should not restart for having a late heartbeat
Date Wed, 09 Aug 2006 21:19:15 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-181?page=comments#action_12427033 ] 
            
Sameer Paranjpye commented on HADOOP-181:
-----------------------------------------

I feel that improved detection of tasktracker death is a separate issue, which needs addressing.
At the same time, we need to try and not lose work if communication between a tasktracker
and the jobtracker fails for some reason.

For instance, a tasktracker may appear lost to the jobtracker due to transient network problems.
In such a case it would be ok for the jobtracker to mark the lost tasks as failed and reschedule
them to other places. 
If communication to the jobtracker is subsequently restored, while the job is still in progress,
the jobtracker can 
easily mark the lost and found tasks as succeeded. Multiple instances of a task should be
handled by the speculative execution code. It seems like we could avoid losing a lot of work
if we had such as mechanism in place.




> task trackers should not restart for having a late heartbeat
> ------------------------------------------------------------
>
>                 Key: HADOOP-181
>                 URL: http://issues.apache.org/jira/browse/HADOOP-181
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>             Fix For: 0.6.0
>
>         Attachments: lost-heartbeat.patch
>
>
> TaskTrackers should not close and restart themselves for having a late heartbeat. The
JobTracker should just accept their current status.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message