hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4748) Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
Date Thu, 25 Oct 2012 20:05:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484442#comment-13484442
] 

Jason Lowe commented on MAPREDUCE-4748:
---------------------------------------

Here's a log from another case showing we have a race between two attempts from the same task
that succeed almost simultaneously:

{noformat}
2012-10-24 11:31:40,751 INFO [IPC Server handler 1 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Status update from attempt_1350066773975_116662_m_032327_1
2012-10-24 11:31:40,751 INFO [IPC Server handler 1 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Progress of TaskAttempt attempt_1350066773975_116662_m_032327_1 is : 1.0
2012-10-24 11:31:40,751 INFO [IPC Server handler 21 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Done acknowledgement from attempt_1350066773975_116662_m_032327_1
2012-10-24 11:31:40,751 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1350066773975_116662_m_032327_1 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP
2012-10-24 11:31:40,752 INFO [ContainerLauncher #55] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1350066773975_116662_01_051566
taskAttempt attempt_1350066773975_116662_m_032327_1
2012-10-24 11:31:40,752 INFO [ContainerLauncher #55] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
KILLING attempt_1350066773975_116662_m_032327_1
2012-10-24 11:31:40,754 INFO [IPC Server handler 7 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Status update from attempt_1350066773975_116662_r_000003_0
2012-10-24 11:31:40,754 INFO [IPC Server handler 7 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Progress of TaskAttempt attempt_1350066773975_116662_r_000003_0 is : 0.3333072
2012-10-24 11:31:40,755 INFO [IPC Server handler 25 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Status update from attempt_1350066773975_116662_m_032327_0
2012-10-24 11:31:40,755 INFO [IPC Server handler 25 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Progress of TaskAttempt attempt_1350066773975_116662_m_032327_0 is : 1.0
2012-10-24 11:31:40,756 INFO [IPC Server handler 20 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Done acknowledgement from attempt_1350066773975_116662_m_032327_0
2012-10-24 11:31:40,756 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1350066773975_116662_m_032327_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP
2012-10-24 11:31:40,756 INFO [ContainerLauncher #484] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1350066773975_116662_01_037193
taskAttempt attempt_1350066773975_116662_m_032327_0
2012-10-24 11:31:40,756 INFO [ContainerLauncher #484] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
KILLING attempt_1350066773975_116662_m_032327_0
2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1350066773975_116662_m_032327_1 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP
to SUCCEEDED
2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
Task succeeded with attempt attempt_1350066773975_116662_m_032327_1
2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
Issuing kill to other attempt attempt_1350066773975_116662_m_032327_0
2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1350066773975_116662_m_032327 Task Transitioned from RUNNING to SUCCEEDED
2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
Num completed Tasks: 51029
2012-10-24 11:31:40,780 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1350066773975_116662_m_032327_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP
to SUCCEEDED
2012-10-24 11:31:40,814 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
Can't handle this event at current state for task_1350066773975_116662_m_032327
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_SUCCEEDED
at SUCCEEDED
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:604)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:914)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:908)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
	at java.lang.Thread.run(Thread.java:619)
2012-10-24 11:31:40,814 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
Invalid event T_ATTEMPT_SUCCEEDED on Task task_1350066773975_116662_m_032327
2012-10-24 11:31:40,818 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1350066773975_116662Job Transitioned from RUNNING to ERROR
{noformat}

We tried to kill the other attempt but it succeeded before the kill arrived, hence T_ATTEMPT_SUCCEEDED
at SUCCEEDED.
                
> Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
> -----------------------------------------------
>
>                 Key: MAPREDUCE-4748
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4748
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.3
>            Reporter: Robert Joseph Evans
>            Assignee: Jason Lowe
>
> We saw this happen when running a large pig script.
> {noformat}
> 2012-10-23 22:45:24,986 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
Can't handle this event at current state for task_1350837501057_21978_m_040453
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_SUCCEEDED
at SUCCEEDED
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:604)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:914)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:908)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>         at java.lang.Thread.run(Thread.java:619)
> {noformat}
> Speculative execution was enabled, and that task did speculate so it looks like this
is an error in the state machine either between the task attempts or just within that single
task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message