Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 76099200BD4 for ; Thu, 1 Dec 2016 08:22:03 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 730D0160B0F; Thu, 1 Dec 2016 07:22:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BB20D160B0B for ; Thu, 1 Dec 2016 08:22:02 +0100 (CET) Received: (qmail 44149 invoked by uid 500); 1 Dec 2016 07:22:00 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 44118 invoked by uid 99); 1 Dec 2016 07:21:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2016 07:21:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BB2E62C1F56 for ; Thu, 1 Dec 2016 07:21:59 +0000 (UTC) Date: Thu, 1 Dec 2016 07:21:59 +0000 (UTC) From: "Hadoop QA (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5937) stop-yarn.sh is not able to gracefully stop node managers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 01 Dec 2016 07:22:03 -0000 [ https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711156#comment-15711156 ] Hadoop QA commented on YARN-5937: --------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 13s{color} | {color:green} The patch generated 0 new + 116 unchanged - 1 fixed = 116 total (was 117) {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 11s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 41s{color} | {color:green} hadoop-yarn in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 22m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5937 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12840675/YARN-5937.01.patch | | Optional Tests | asflicense mvnsite unit shellcheck shelldocs | | uname | Linux 4d1ba7995b5b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1f7613b | | shellcheck | v0.4.5 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14142/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn U: hadoop-yarn-project/hadoop-yarn | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14142/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > stop-yarn.sh is not able to gracefully stop node managers > --------------------------------------------------------- > > Key: YARN-5937 > URL: https://issues.apache.org/jira/browse/YARN-5937 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Labels: script > Attachments: YARN-5937.01.patch, nm_shutdown.log > > > stop-yarn.sh always gives following output > {code} > ./sbin/stop-yarn.sh > Stopping resourcemanager > Stopping nodemanagers > : WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9 > : ERROR: Unable to kill 18097 > {code} > this was because resource manager is stopped before node managers, when the shutdown hook manager tries to gracefully stop NM services, NM needs to unregister with RM, and it gets timeout as NM could not connect to RM (already stopped). See log (stop RM then run kill ) > {code} > 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM > ... > 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException > java.util.concurrent.TimeoutException > at java.util.concurrent.FutureTask.get(FutureTask.java:205) > at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67) > ... > at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291) > ... > 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown forcefully. > {code} > the shutdown hooker has a default of 10s timeout, so if RM is stopped before NMs, they always took more than 10s to stop (in java code). However stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped. > It would make sense to stop NMs before RMs in this script, in a graceful way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org