hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1144) Hadoop should allow a configurable percentage of failed map tasks before declaring a job failed.
Date Mon, 26 Mar 2007 19:51:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484207
] 

Andrzej Bialecki  commented on HADOOP-1144:
-------------------------------------------

Nutch could use this feature too - it's quite common that one of the map tasks, which is e.g.
parsing a difficult content like PDF or msdoc, crashes or gets stuck. This should not be fatal
to the whole job.

As for the configuration of the number of failed tasks - I think it would be good to be able
to choose between an absolute number or a percentage.

> Hadoop should allow a configurable percentage of failed map tasks before declaring a
job failed.
> ------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1144
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1144
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Christian Kunz
>             Fix For: 0.13.0
>
>
> In our environment it can occur that some map tasks will fail repeatedly because of corrupt
input data, which sometimes is non-critical as long as the amount is limited. In this case
it is annoying that the whole Hadoop job fails and cannot be restarted till the corrupt data
are identified and eliminated from the input. It would be extremely helpful if the job configuration
would allow to indicate how many map tasks are allowed to fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message