Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A204B17519 for ; Wed, 27 May 2015 16:38:18 +0000 (UTC) Received: (qmail 53993 invoked by uid 500); 27 May 2015 16:38:18 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 53955 invoked by uid 500); 27 May 2015 16:38:18 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 53939 invoked by uid 99); 27 May 2015 16:38:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 May 2015 16:38:18 +0000 Date: Wed, 27 May 2015 16:38:18 +0000 (UTC) From: "Dmitry Sivachenko (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561234#comment-14561234 ] Dmitry Sivachenko commented on YARN-3066: ----------------------------------------- Solaris can use the same ssid program (it is just a simple wrapper for setsid() syscall). I just proposed a simplest fix for that problem. JNI wrapper sounds like better approach. What I want to see in any case is the loud error message in case setsid binary (or setsid() syscall if we go JNI way) is unavailable. Right now it pretends to work and I spent some time digging out whats going wrong and why I see a lot of orphans. > Hadoop leaves orphaned tasks running after job is killed > -------------------------------------------------------- > > Key: YARN-3066 > URL: https://issues.apache.org/jira/browse/YARN-3066 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1 > Reporter: Dmitry Sivachenko > > When spawning user task, node manager checks for setsid(1) utility and spawns task program via it. See hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java for instance: > String exec = Shell.isSetsidAvailable? "exec setsid" : "exec"; > FreeBSD, unlike Linux, does not have setsid(1) utility. So plain "exec" is used to spawn user task. If that task spawns other external programs (this is common case if a task program is a shell script) and user kills job via mapred job -kill , these child processes remain running. > 1) Why do you silently ignore the absence of setsid(1) and spawn task process via exec: this is the guarantee to have orphaned processes when job is prematurely killed. > 2) FreeBSD has a replacement third-party program called ssid (which does almost the same as Linux's setsid). It would be nice to detect which binary is present during configure stage and put @SETSID@ macros into java file to use the correct name. > I propose to make Shell.isSetsidAvailable test more strict and fail to start if it is not found: at least we will know about the problem at start rather than guess why there are orphaned tasks running forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)