hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4246) Reduce task copy errors may not kill it eventually
Date Thu, 25 Sep 2008 04:17:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634369#action_12634369
] 

Jothi Padmanabhan commented on HADOOP-4246:
-------------------------------------------

The patch looks good. A few minor comments

* Since MAX_FAILED_UNIQUE_FETCHES is no longer a constant, it should be named maxFailedUniqueFetches

* getClosestPowerOf2 will not return negative numbers. So, this piece of code 
{code}  
   if (this.maxFetchRetriesPerMap < 1) {
        this.maxFetchRetriesPerMap = 1;
      }
{code}
should be modifed to
{code}
if (this.maxFetcRetriesPerMap ==0) {
  this.maxFetchRetriesPerMap = 1;
}
{code}

for better clarity
* For the backoff value for a GENERIC_ERROR, should we just back off by a fixed amount and
retry? The concern here is that if we are hitting a 'disk-out-of-space' exception, we are
better off identifying it earlier than late. If the map_run_time is high, we might actually
be spending a lot of time before the jobtracker gets notified. Thoughts?


> Reduce task copy errors may not kill it eventually
> --------------------------------------------------
>
>                 Key: HADOOP-4246
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4246
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: patch-4246.txt
>
>
> maxFetchRetriesPerMap in reduce task can be zero some times (when maxMapRunTime is less
than 4 seconds or mapred.reduce.copy.backoff is less than 4). This will not count reduce task
copy errors to kill it eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message