hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Sivachenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
Date Thu, 15 Jan 2015 21:04:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279287#comment-14279287
] 

Dmitry Sivachenko commented on YARN-3066:
-----------------------------------------

Windows case is tested separately, see private static boolean isSetsidSupported() in
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shel
l.java

for instance:

if (Shell.WINDOWS) {
      return false;
}

In any UNIX-like case I suppose it will leave orphaned processes, because if isSetsidSupported()==false
it uses kill(pid) to kill task instead of kill(pgid) to kill the whole process group.

ssid(1) in FreeBSD  is the analog setsid(1) in Linux: userland wrapper for setsid() system
call.

Renaming does not sound as sane idea, because it is hard to convince all people to do rename
of installed binaries by hand.

I propose to treat it like system-dependent option and act accordingly.

(I suppose other OS's like Solaris also lack setsid(1) utility so they could also benefit).

For ssid source see http://tools.suckless.org/ssid/

As for backwards compatibility we can change that in 3.0, it is not fatal, failure to start
without setsid will just remind users to install setsid() or ssid() and proceed futher, and
be sure that there will be no side effects like orphaned tasks eating CPU.

> Hadoop leaves orphaned tasks running after job is killed
> --------------------------------------------------------
>
>                 Key: YARN-3066
>                 URL: https://issues.apache.org/jira/browse/YARN-3066
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>         Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1
>            Reporter: Dmitry Sivachenko
>
> When spawning user task, node manager checks for setsid(1) utility and spawns task program
via it. See hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
for instance:
> String exec = Shell.isSetsidAvailable? "exec setsid" : "exec";
> FreeBSD, unlike Linux, does not have setsid(1) utility.  So plain "exec" is used to spawn
user task.  If that task spawns other external programs (this is common case if a task program
is a shell script) and user kills job via mapred job -kill <Job>, these child processes
remain running.
> 1) Why do you silently ignore the absence of setsid(1) and spawn task process via exec:
this is the guarantee to have orphaned processes when job is prematurely killed.
> 2) FreeBSD has a replacement third-party program called ssid (which does almost the same
as Linux's setsid).  It would be nice to detect which binary is present during configure stage
and put @SETSID@ macros into java file to use the correct name.
> I propose to make Shell.isSetsidAvailable test more strict and fail to start if it is
not found:  at least we will know about the problem at start rather than guess why there are
orphaned tasks running forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message