hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arkady Borkovsky <ark...@yahoo-inc.com>
Subject Re: [jira] Created: (HADOOP-569) Hadoop should allow the user to dynamically change the number of times to re-try failed tasks before declaring the job fail
Date Mon, 02 Oct 2006 17:15:58 GMT
While "dynamically change the number of times to re-try" may be a  
useful feature, the underlying problem may be also addressed be
-- Speculative execution (once we have only few tasks remaining, run  
several instances on them on different nodes and use the output of the  
one that completes first). -- I guess there is a H-issue on this.
-- speculative execution with re-splitting -- rather than running  
additional instances of the same "slow task", split its input into  
several "sub-splits" and run "speculative tasks" on the sub-splits.   
I'd invest into this.

On Oct 2, 2006, at 9:36 AM, Runping Qi (JIRA) wrote:

> Hadoop should allow the user to dynamically change the number of times  
> to re-try failed tasks before declaring the job fail
> ----------------------------------------------------------------------- 
> ----------------------------------------------------
>
>                  Key: HADOOP-569
>                  URL: http://issues.apache.org/jira/browse/HADOOP-569
>              Project: Hadoop
>           Issue Type: Improvement
>           Components: mapred
>             Reporter: Runping Qi
>
>
>
> Hadoop has a built-in mechanism to fail a job if some tasks failed  
> more than 3 times. This mechanism works fine in most scenarios.  
> However, in some other cases, it is highly desirable for the user to  
> change (increase) that number. My current running job demonstrates  
> such a scenario: The job has run more than 2.5 days. It is close to  
> complete (90+%). Everything indicates that it will finish eventually  
> in a day, except for one potential danger: some of the tasks are in  
> their 3rd try!
> It will be extremely helpful if I can change the maximun number of  
> tries to 6 instead of 4!
>
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:  
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:  
> http://www.atlassian.com/software/jira
>
>


Mime
View raw message