hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure
Date Tue, 22 Jan 2013 22:06:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560090#comment-13560090

Sandy Ryza commented on MAPREDUCE-4951:

Regarding the other special exit codes, my opinion is that they don't merit the same treatment.
In general, and if I understand correctly how things worked in MR1, failed tasks should be
considered guilty until proven innocent, with innocent meaning killed explicitly by the RM,
and guilty meaning anything else.

That's correct that a ContainerKillEvent is issued in both cases.  However, if I understand
correctly, when a container is explicitly killed by the RM, the special value of -100 is reported
to the AM instead of any exit code reported by the NM.  You can look for references to YarnConfiguration.ABORTED_CONTAINER_EXIT_STATUS
to see when/how this works.

> Container preemption interpreted as task failure
> ------------------------------------------------
>                 Key: MAPREDUCE-4951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mr-am, mrv2
>    Affects Versions: 2.0.2-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch
> When YARN reports a completed container to the MR AM, it always interprets it as a failure.
 This can lead to a job failing because too many of its tasks failed, when in fact they only
failed because the scheduler preempted them.
> MR needs to recognize the special exit code value of -100 and interpret it as a container
being killed instead of a container failure.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message