hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3031) Job Client goes into infinite loop when we kill AM
Date Mon, 19 Sep 2011 13:31:09 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107813#comment-13107813

Vinod Kumar Vavilapalli commented on MAPREDUCE-3031:

This is a bug in NM and just about any container which is killed like this(doing a kill $pid
on the node) will be stuck at RUNNING state on the RM. I found this on the corresponding NM:

org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_KILLED_ON_REQUEST
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:297)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:39)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:439)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:685)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:69)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:356)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:349)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:113)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
        at java.lang.Thread.run(Thread.java:619)

This is because an exit code of 137/143 is treated as a kill request. On hind sight it turns
out this is a bad idea, we should fix this.

> Job Client goes into infinite loop when we kill AM
> --------------------------------------------------
>                 Key: MAPREDUCE-3031
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3031
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Karam Singh
>             Fix For: 0.23.0
> Started a cluster. Submitted a sleep job with around 10000 maps and 1000 reduces.
> Killed AM with kill -9 by which time already 7000 thousands maps got completed.
> On the RM webUI, Application is stuck in Application.RUNNING state. And JobClient goes
into an infinite loop as RM keeps telling the client that the application is running.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message