hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3949) If AM fails due to overrunning resource limits, error not visible through UI sometimes
Date Sun, 31 Mar 2013 04:27:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618236#comment-13618236

Vinod Kumar Vavilapalli commented on MAPREDUCE-3949:

[~raviprak] says [on MAPREDUCE-3688|https://issues.apache.org/jira/browse/MAPREDUCE-3688?focusedCommentId=13606901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606901]:
bq. From my testing on trunk, I notice that even for the case where the AM goes over container
limits (which I trigger with -Dyarn.app.mapreduce.am.resource.mb=512 -Dyarn.app.mapreduce.am.command-opts="-Xmx3500m"
on a sleep job), sometimes the error is propagated back and sometimes its not. Can you please
corroborate this? When State == FinalState == FAILED, the error is propagated back. However
about half the times, State == FINISHED and FinalState == KILLED, in which case there is no
message anywhere to help me. Not in the diagnostics, and there are no logs.
> If AM fails due to overrunning resource limits, error not visible through UI sometimes
> --------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-3949
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3949
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.24.0, 0.23.2
>            Reporter: Todd Lipcon
>            Assignee: Ravi Prakash
>            Priority: Minor
> I had a case where an MR AM eclipsed the configured memory limit. This caused the AM's
container to get killed, but nowhere accessible through the web UI showed these diagnostics.
I had to go view the NM's logs via ssh before I could figure out what had happened to my application.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message