hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4637) Killing an unassigned task attempt causes the job to fail
Date Wed, 05 Sep 2012 16:06:07 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448845#comment-13448845
] 

Tom White commented on MAPREDUCE-4637:
--------------------------------------

Here are the relevant lines from the AM's log when a task attempt (attempt_1337372605417_0946_m_000095_0)
was killed manually from the CLI:

{noformat}
2012-05-24 11:45:49,328 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1337372605417_0946_m_000095_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2012-05-24 12:01:55,072 INFO [IPC Server handler 0 on 53369] org.apache.hadoop.mapreduce.v2.app.client.MRClientService:
Kill task attempt received from client attempt_1337372605417_0946_m_000095_0
2012-05-24 12:01:55,073 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Can't handle this event at current state for attempt_1337372605417_0946_m_000095_0
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_DIAGNOSTICS_UPDATE
at UNASSIGNED
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:941)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:133)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:912)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:904)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74)
        at java.lang.Thread.run(Thread.java:662)
2012-05-24 12:01:55,076 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1337372605417_0946_m_000095_0 TaskAttempt Transitioned from UNASSIGNED to KILLED
2012-05-24 12:01:55,078 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1337372605417_0946Job Transitioned from RUNNING to ERROR
2012-05-24 12:01:55,078 INFO [Thread-46] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Processing the event EventType: CONTAINER_DEALLOCATE
{noformat}

The bugs seems to be in MRClientService.MRClientProtocolHandler#killTaskAttempt() where the
handler is sent a TaskAttemptDiagnosticsUpdateEvent. However, in the UNASSIGNED state a TaskAttempt
cannot handle this event, so it fails with an error. The simplest fix would be to allow a
TaskAttemptDiagnosticsUpdateEvent in the UNASSIGNED state. The same bug occurs with the fail
task command. 

This bug was found by Wing Yew Poon.
                
> Killing an unassigned task attempt causes the job to fail
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-4637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4637
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Tom White
>
> Attempting to kill a task attempt that has been scheduled but is not running causes an
invalid state transition and the AM to stop with an error. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message