Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <33477292.1181189186046.JavaMail.jira@brutus>
Date: Wed, 6 Jun 2007 21:06:26 -0700 (PDT)
From: "Arun C Murthy (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-1158) JobTracker should collect
 statistics of failed map output fetches, and take decisions to reexecute
 map tasks and/or restart the (possibly faulty) Jetty server on the
 TaskTracker
In-Reply-To: <7318675.1174848452133.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502212 ] 

Arun C Murthy commented on HADOOP-1158:
---------------------------------------

Oh, it might be worth considering a separate MapTaskStatus and ReduceTaskStatus classes since there are varied pieces of un-related information for Map and Reduce tasks (i.e. shuffle/sort-merge related info, fetch failures etc.) ... we could stick it up in the appropriate 'Task' class too (which the child-vm could then compute and send to the TaskTracker) - perhaps as a separate issue?

> JobTracker should collect statistics of failed map output fetches, and take decisions to reexecute map tasks and/or restart the (possibly faulty) Jetty server on the TaskTracker
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1158
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1158
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Devaraj Das
>            Assignee: Arun C Murthy
>
> The JobTracker should keep a track (with feedback from Reducers) of how many times a fetch for a particular map output failed. If this exceeds a certain threshold, then that map should be declared as lost, and should be reexecuted elsewhere. Based on the number of such complaints from Reducers, the JobTracker can blacklist the TaskTracker. This will make the framework reliable - it will take care of (faulty) TaskTrackers that sometimes always fail to serve up map outputs (for which exceptions are not properly raised/handled, for e.g., if the exception/problem happens in the Jetty server).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.