hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4748) Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
Date Fri, 26 Oct 2012 00:05:12 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated MAPREDUCE-4748:
----------------------------------

    Attachment: MAPREDUCE-4748.patch

Simple patch to ignore T_ATTEMPT_SUCCEEDED, T_KILL, and T_ATTEMPT_COMMIT_PENDING at SUCCEEDED
and keep the job from abruptly ending in error.

I'm a bit worried about the bookkeeping wrt. task.finishedAttempts and task.numberUncompletedAttempts.
 Current patch matches the bookkeeping behavior for T_ATTEMPT_KILLED or T_ATTEMPT_FAILED when
we're effectively ignoring the event.  However I'm wondering if this could lead to corner
cases during KILL_WAIT like those reported in MAPREDUCE-4745.

It looks like TaskAttempt will report T_ATTEMPT_KILLED after it succeeded but only for map
tasks.  We don't want to double-count in that case, but if a kill of the TaskAttempt doesn't
report it was killed it seems like we could miss some bookeeping if we just ignore bookkeeping
when we see an attempt redundantly succeeded.  Thoughts?
                
> Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
> -----------------------------------------------
>
>                 Key: MAPREDUCE-4748
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4748
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.3
>            Reporter: Robert Joseph Evans
>            Assignee: Jason Lowe
>         Attachments: MAPREDUCE-4748.patch
>
>
> We saw this happen when running a large pig script.
> {noformat}
> 2012-10-23 22:45:24,986 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
Can't handle this event at current state for task_1350837501057_21978_m_040453
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_SUCCEEDED
at SUCCEEDED
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:604)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:914)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:908)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>         at java.lang.Thread.run(Thread.java:619)
> {noformat}
> Speculative execution was enabled, and that task did speculate so it looks like this
is an error in the state machine either between the task attempts or just within that single
task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message