hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5397) AM crashes because Webapp failed to start on multi node cluster
Date Fri, 28 Mar 2014 16:24:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950968#comment-13950968
] 

Rohith commented on MAPREDUCE-5397:
-----------------------------------

Thoughts about for AM crash
1.WebApp start can fail even because of address already in use.But in code, even web app start
failed, ignoring and continuing for other service to start.I feel this need to be handled
which avoids NPE. 

> AM crashes because Webapp failed to start on multi node cluster
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-5397
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5397
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Jian He
>         Attachments: MRAppMasterlog.txt, log.txt
>
>
> I set up a 12 nodes cluster and tried submitting jobs but get this exception.
> But job is able to succeed after AM crashes and retry a few times(2 or 3)
> {code}
> 2013-07-12 18:56:28,438 INFO [main] org.mortbay.log: Extract jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce
to /tmp/Jetty_0_0_0_0_43554_mapreduce____ljbmlg/webapp
> 2013-07-12 18:56:28,528 WARN [main] org.mortbay.log: Failed startup of context org.mortbay.jetty.webapp.WebAppContext@2726b2{/,jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce}
> java.io.FileNotFoundException: /tmp/Jetty_0_0_0_0_43554_mapreduce____ljbmlg/webapp/webapps/mapreduce/.keep
(No such file or directory)
> 	at java.io.FileOutputStream.open(Native Method)
> 	at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
> 	at java.io.FileOutputStream.<init>(FileOutputStream.java:145)
> 	at org.mortbay.resource.JarResource.extract(JarResource.java:215)
> 	at org.mortbay.jetty.webapp.WebAppContext.resolveWebApp(WebAppContext.java:974)
> 	at org.mortbay.jetty.webapp.WebAppContext.getWebInf(WebAppContext.java:832)
> 	at org.mortbay.jetty.webapp.WebInfConfiguration.configureClassLoader(WebInfConfiguration.java:62)
> 	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:489)
> 	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
> 	at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
> 	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
> 	at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
> 	at org.mortbay.jetty.Server.doStart(Server.java:224)
> 	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
> 	at org.apache.hadoop.http.HttpServer.start(HttpServer.java:684)
> 	at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:211)
> 	at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:134)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message