hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Sivachenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed
Date Wed, 27 May 2015 16:38:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561234#comment-14561234
] 

Dmitry Sivachenko commented on YARN-3066:
-----------------------------------------

Solaris can use the same ssid program (it is just a simple wrapper for setsid() syscall).
I just proposed a simplest fix for that problem.
JNI wrapper sounds like better approach.

What I want to see in any case is the loud error message in case setsid binary (or setsid()
syscall if we go JNI way) is unavailable.  Right now it pretends to work and I spent some
time digging out whats going wrong and why I see a lot of orphans.

> Hadoop leaves orphaned tasks running after job is killed
> --------------------------------------------------------
>
>                 Key: YARN-3066
>                 URL: https://issues.apache.org/jira/browse/YARN-3066
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>         Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1
>            Reporter: Dmitry Sivachenko
>
> When spawning user task, node manager checks for setsid(1) utility and spawns task program
via it. See hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
for instance:
> String exec = Shell.isSetsidAvailable? "exec setsid" : "exec";
> FreeBSD, unlike Linux, does not have setsid(1) utility.  So plain "exec" is used to spawn
user task.  If that task spawns other external programs (this is common case if a task program
is a shell script) and user kills job via mapred job -kill <Job>, these child processes
remain running.
> 1) Why do you silently ignore the absence of setsid(1) and spawn task process via exec:
this is the guarantee to have orphaned processes when job is prematurely killed.
> 2) FreeBSD has a replacement third-party program called ssid (which does almost the same
as Linux's setsid).  It would be nice to detect which binary is present during configure stage
and put @SETSID@ macros into java file to use the correct name.
> I propose to make Shell.isSetsidAvailable test more strict and fail to start if it is
not found:  at least we will know about the problem at start rather than guess why there are
orphaned tasks running forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message