hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yeshavora (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-1087) Succeed job tries to restart after RMrestart
Date Tue, 20 Aug 2013 23:36:52 GMT
yeshavora created YARN-1087:
-------------------------------

             Summary: Succeed job tries to restart after RMrestart
                 Key: YARN-1087
                 URL: https://issues.apache.org/jira/browse/YARN-1087
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: yeshavora
            Priority: Blocker


Run a job , restart RM when job just finished. It should not restart the job once it Succeed.

After RM restart, The AM of restarted job fails with below error.

AM log after Rmrestart:

013-08-19 17:29:21,144 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
Stopping JobHistoryEventHandler. Size of the outstanding queue size is 0
2013-08-19 17:29:21,145 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
Stopped JobHistoryEventHandler. super.stop()
2013-08-19 17:29:21,146 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting
staging directory hdfs://host1:port1/user/ABC/.staging/job_1376933101704_0001
2013-08-19 17:29:21,156 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error
starting MRAppMaster
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File
does not exist: hdfs://host1:port1/ABC/.staging/job_1376933101704_0001/job.splitmetainfo
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1469)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1324)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1291)
        at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:922)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:131)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1184)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:995)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1323)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://host1:port1/ABC/.staging/job_1376933101704_0001/job.splitmetainfo
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1121)
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1113)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:78)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1113)
        at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1464)
        ... 17 more
2013-08-19 17:29:21,158 INFO [Thread-2] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster
received a signal. Signaling RMCommunicator and JobHistoryEventHandler.
2013-08-19 17:29:21,159 WARN [Thread-2] org.apache.hadoop.util.ShutdownHookManager: ShutdownHook
'MRAppMasterShutdownHook' failed, java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:805)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1344)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message