hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-186) communication problems in the task tracker cause long latency
Date Tue, 02 May 2006 04:27:47 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-186?page=all ]

Owen O'Malley updated HADOOP-186:
---------------------------------

    Attachment: task-tracker-loop-isolation.patch

This patch adds catches around each of the routines that may throw in the offerService loop,
with the exception of the emitHeartbeat. (I left one of them unguarded so that if the connection
was down, it will get back to TaskTracker.run().) It also moves the time of the previous heartbeat
to the top of the cycle rather than the bottom, which means that under load the task tracker
stays much closer to a 10 second cycle than before.

> communication problems in the task tracker cause long latency
> -------------------------------------------------------------
>
>          Key: HADOOP-186
>          URL: http://issues.apache.org/jira/browse/HADOOP-186
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.2
>  Attachments: task-tracker-loop-isolation.patch
>
> The Task Tracker's offerService loop has no protection from exceptions, so that any communication
problems with the Job Tracker, such as RPC timeouts, cause the TaskTracker to sleep 5 seconds
and start again at the top of the loop. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message