hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (MAPREDUCE-5444) MRAppMaster throws InvalidStateTransitonException: Invalid event: JOB_AM_REBOOT at SUCCEEDED
Date Fri, 02 Aug 2013 14:53:48 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe resolved MAPREDUCE-5444.
-----------------------------------

    Resolution: Invalid

bq. I have one point to add here that, immidiately after job is succeeded , app master got
reboot command from RM. JobClient is exitted( see MAPREDUCE-5441 ). By the time, RM has launched
2nd attempt of app master. 2nd attempt app master too compete for resources, but there is
no client waiting getting job report.I feel this is problem.

There will always be a race where the job has just succeeded but the RM gets out of sync with
the AM before the AM can unregister.  Normally the AM will exit, another AM attempt will be
launched by the RM, and the new attempt will recover the previous SUCCEEDED state and exit
shortly afterwards without launching any subsequent tasks.

As for the client, that's an orthogonal problem.  It's not required that a client be listening
to an application as it executes, and if the client is unnecessarily exiting across an AM
restart then we can tackle that issue in MAPREDUCE-5441.
                
> MRAppMaster throws InvalidStateTransitonException: Invalid event: JOB_AM_REBOOT at SUCCEEDED
> --------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5444
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5444
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster
>            Reporter: Rohith Sharma K S
>            Priority: Minor
>
> {noformat}
> 2013-08-02 14:55:11,537 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Calling handler for JobFinishedEvent 
> 2013-08-02 14:55:11,538 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1375199817609_0049Job Transitioned from COMMITTING to SUCCEEDED
> 2013-08-02 14:55:11,663 INFO [Thread-52] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
Copying hdfs://0.0.0.0:45000/home/restest/staging-dir/restest/.staging/job_1375199817609_0049/job_1375199817609_0049_2.jhist
to hdfs://0.0.0.0:45000/home/restest/staging-dir/history/done_intermediate/restest/job_1375199817609_0049-1375435337429-restest-word+count-1375435511533-10-1-SUCCEEDED-a.jhist_tmp
> 2013-08-02 14:55:11,750 INFO [Thread-52] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
Copied to done location: hdfs://0.0.0.0:45000/home/restest/staging-dir/history/done_intermediate/restest/job_1375199817609_0049-1375435337429-restest-word+count-1375435511533-10-1-SUCCEEDED-a.jhist_tmp
> 2013-08-02 14:55:11,769 INFO [Thread-52] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
Copying hdfs://0.0.0.0:45000/home/restest/staging-dir/restest/.staging/job_1375199817609_0049/job_1375199817609_0049_2_conf.xml
to hdfs://0.0.0.0:45000/home/restest/staging-dir/history/done_intermediate/restest/job_1375199817609_0049_conf.xml_tmp
> 2013-08-02 14:55:11,880 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:1
CompletedMaps:10 CompletedReds:1 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
> 2013-08-02 14:55:13,649 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Error communicating with RM: Resource Manager doesn't recognize AttemptId: application_1375199817609_0049
> org.apache.hadoop.yarn.YarnException: Resource Manager doesn't recognize AttemptId: application_1375199817609_0049
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:626)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:238)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:250)
> 	at java.lang.Thread.run(Thread.java:662)
> 2013-08-02 14:55:13,649 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_AM_REBOOT
at SUCCEEDED
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:914)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:129)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1114)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1110)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
> 	at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.realDispatch(RecoveryService.java:309)
> 	at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.dispatch(RecoveryService.java:305)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
> 	at java.lang.Thread.run(Thread.java:662)
> 2013-08-02 14:55:13,652 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
JobHistoryEvent is triggered from JobImpl
> 2013-08-02 14:55:13,652 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1375199817609_0049Job Transitioned from SUCCEEDED to ERROR
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message