hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4993) AM thinks it was killed when an error occurs setting up a task container launch context
Date Sat, 02 Mar 2013 17:37:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591455#comment-13591455
] 

Jason Lowe commented on MAPREDUCE-4993:
---------------------------------------

I'm not exactly sure what happened in this case, as I'm just documenting the poor error handling
by the AM on a job I was asked to analyze.  From the stacktrace it looks like the AM was trying
to setup the common portion of the task launch contexts and encountered an IOException while
processing distributed cache files because they were deleted.  Maybe someone submitted a job
whose distributed cache files in HDFS were deleted while the job was still in-flight?

Anyway the problem is, as you point out, that the AM is not properly handling exceptions while
setting up the common container launch context for tasks.  If an error occurs while setting
that up, it should fail the job with the job diagnostics indicating the exception message
and stacktrace rather than simply exiting with no diagnostics.
                
> AM thinks it was killed when an error occurs setting up a task container launch context
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4993
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4993
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Abhishek Kapoor
>
> If an IOException occurs while setting up a container launch context for a task then
the AM exits with a KILLED status and no diagnostics.  The job should be marked as FAILED
(or maybe ERROR) with a useful diagnostics message indicating the nature of the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message