hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sumit Mohanty (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally
Date Sat, 01 Nov 2014 19:19:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193379#comment-14193379
] 

Sumit Mohanty commented on YARN-1922:
-------------------------------------

This is critical for long running processes because if the container consists of multiple
processes and one goes down then others must go down. The scenario where we saw this is a
Slider scenario trying to deploy HBase. A bad configuration resulted in the agent going down
but the hbase processes stayed active. User fixed the bad config and tried to run again but
the next instance ran into port conflict.

> Process group remains alive after container process is killed externally
> ------------------------------------------------------------------------
>
>                 Key: YARN-1922
>                 URL: https://issues.apache.org/jira/browse/YARN-1922
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.4.0
>         Environment: CentOS 6.4
>            Reporter: Billie Rinaldi
>            Assignee: Billie Rinaldi
>         Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, YARN-1922.4.patch
>
>
> If the main container process is killed externally, ContainerLaunch does not kill the
rest of the process group.  Before sending the event that results in the ContainerLaunch.containerCleanup
method being called, ContainerLaunch sets the "completed" flag to true.  Then when cleaning
up, it doesn't try to read the pid file if the completed flag is true.  If it read the pid
file, it would proceed to send the container a kill signal.  In the case of the DefaultContainerExecutor,
this would kill the process group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message