hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3688) Need better Error message if AM is killed/throws exception
Date Wed, 06 Mar 2013 00:06:14 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Koji Noguchi updated MAPREDUCE-3688:
------------------------------------

    Attachment: mapreduce-3688-h0.23-v01.patch

This has been a pain for our users as well.

I don't think this patch will fly well with the reviewers, but maybe it'll help move the discussion
forward. 

I didn't see a good way of communicating the error message to the caller so decided to sacrifice
the stdout that current MRAppMaster does not use. 

After the patch, webUI would show

{quote}
Diagnostics:	 Application application_1362527487477_0005 failed 1 times due to AM Container
for appattempt_1362527487477_0005_000001 exited with exitCode: 1 due to: Error starting MRAppMaster:
org.apache.hadoop.yarn.YarnException: java.io.IOException: Split metadata size exceeded 20.
Aborting job job_1362527487477_0005 at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1290)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1146)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1118)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:382)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:299)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:823) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:121)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1094)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:998) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1273)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1221) at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1269)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1226) Caused by: java.io.IOException:
Split metadata size exceeded 20. Aborting job job_1362527487477_0005 at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:53)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1285)
... 16 more .Failing this attempt.. Failing the application.
{quote}

(This patch is based on 0.23)
                
> Need better Error message if AM is killed/throws exception
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-3688
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3688
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.1
>            Reporter: David Capwell
>            Assignee: Sandy Ryza
>             Fix For: 0.23.2
>
>         Attachments: mapreduce-3688-h0.23-v01.patch
>
>
> We need better error messages in the UI if the AM gets killed or throws an Exception.
> If the following error gets thrown: 
> java.lang.NumberFormatException: For input string: "9223372036854775807l" // last char
is an L
> then the UI should say this exception.  Instead I get the following:
> Application application_1326504761991_0018 failed 1 times due to AM Container for appattempt_1326504761991_0018_000001
> exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message