hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4246) Reduce task copy errors may not kill it eventually
Date Thu, 25 Sep 2008 04:17:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634369#action_12634369

Jothi Padmanabhan commented on HADOOP-4246:

The patch looks good. A few minor comments

* Since MAX_FAILED_UNIQUE_FETCHES is no longer a constant, it should be named maxFailedUniqueFetches

* getClosestPowerOf2 will not return negative numbers. So, this piece of code 
   if (this.maxFetchRetriesPerMap < 1) {
        this.maxFetchRetriesPerMap = 1;
should be modifed to
if (this.maxFetcRetriesPerMap ==0) {
  this.maxFetchRetriesPerMap = 1;

for better clarity
* For the backoff value for a GENERIC_ERROR, should we just back off by a fixed amount and
retry? The concern here is that if we are hitting a 'disk-out-of-space' exception, we are
better off identifying it earlier than late. If the map_run_time is high, we might actually
be spending a lot of time before the jobtracker gets notified. Thoughts?

> Reduce task copy errors may not kill it eventually
> --------------------------------------------------
>                 Key: HADOOP-4246
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4246
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.0
>         Attachments: patch-4246.txt
> maxFetchRetriesPerMap in reduce task can be zero some times (when maxMapRunTime is less
than 4 seconds or mapred.reduce.copy.backoff is less than 4). This will not count reduce task
copy errors to kill it eventually.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message