hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1158) JobTracker should collect statistics of failed map output fetches, and take decisions to reexecute map tasks and/or restart the (possibly faulty) Jetty server on the TaskTracker
Date Thu, 23 Aug 2007 16:54:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522193
] 

Arun C Murthy commented on HADOOP-1158:
---------------------------------------

Thanks for the review Enis.

So, here is how we solve issues emanating from Jetty: if there are sufficient failures for
a given map (say due to Jetty), we just fail the map and re-run it elsewhere, there-by the
reducer isn't stuck. Now given sufficient no. of maps fail on the same TaskTracker (say Jetty
again) then it gets blacklisted and hence no tasks are assigned to it... does that make sense?

Please feel free to open further issues if you have other thoughts help improve things...

> JobTracker should collect statistics of failed map output fetches, and take decisions
to reexecute map tasks and/or restart the (possibly faulty) Jetty server on the TaskTracker
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1158
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1158
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Devaraj Das
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1158_20070702_1.patch, HADOOP-1158_2_20070808.patch, HADOOP-1158_3_20070809.patch,
HADOOP-1158_4_20070817.patch, HADOOP-1158_5_20070823.patch
>
>
> The JobTracker should keep a track (with feedback from Reducers) of how many times a
fetch for a particular map output failed. If this exceeds a certain threshold, then that map
should be declared as lost, and should be reexecuted elsewhere. Based on the number of such
complaints from Reducers, the JobTracker can blacklist the TaskTracker. This will make the
framework reliable - it will take care of (faulty) TaskTrackers that sometimes always fail
to serve up map outputs (for which exceptions are not properly raised/handled, for e.g., if
the exception/problem happens in the Jetty server).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message