hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-76) killApplication doesn't fully kill application master
Date Wed, 17 Jul 2013 05:54:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710753#comment-13710753
] 

Xuan Gong commented on YARN-76:
-------------------------------

Do the simple test on Mac environment. Create a java class that simply do the infinite loop,
and create shell script to execute this class. Run the command to kill the shell script process,
but this java process keeps running. Looks like this is the reason.

Possible solution can be instead of execute kill -15/-9 ${pid}, we can do 
1. pkill -9/-15 -P ${pid}. This command will kill the process and all of its children processes,
but does not kill the grandchild.
2. kill -9/-15 -${pgid}. This command will kill all processes, including the grandchild, which
has same process group id. 
And we can use "ps -o pid 4848|sed 1d" to get pgid.
                
> killApplication doesn't fully kill application master
> -----------------------------------------------------
>
>                 Key: YARN-76
>                 URL: https://issues.apache.org/jira/browse/YARN-76
>             Project: Hadoop YARN
>          Issue Type: Bug
>         Environment: Failed on MacOS. OK on Linux
>            Reporter: Bo Wang
>
> When client sends a ClientRMProtocol#killApplication to RM, the corresponding AM is supposed
to be killed. However, on Mac OS, the AM is still alive (w/o any interruption).
> I figured out part of the reason after some debugging. NM starts a AM with command like
"/bin/bash -c /path/to/java SampleAM". This command is executed in a process (say with PID
0001), which starts another Java process (say with PID 0002). When NM kills the AM, it send
SIGTERM and then SIGKILL to the bash process (PID 0001). In Linux, the death of the bash process
(PID 0001) will trigger the kill of the Java process (PID 0002). However, in Mac OS, only
the bash process is killed. The Java process is in the wild since then.
> Note: on Mac OS, DefaultContainerExecutor is used rather than LinuxContainerExecutor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message