Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CF2A1200BD1 for ; Mon, 28 Nov 2016 17:49:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id CDF8E160B00; Mon, 28 Nov 2016 16:49:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2D52E160B22 for ; Mon, 28 Nov 2016 17:49:00 +0100 (CET) Received: (qmail 73551 invoked by uid 500); 28 Nov 2016 16:48:58 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 73503 invoked by uid 99); 28 Nov 2016 16:48:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2016 16:48:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6CE012C03E0 for ; Mon, 28 Nov 2016 16:48:58 +0000 (UTC) Date: Mon, 28 Nov 2016 16:48:58 +0000 (UTC) From: "Weiwei Yang (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-5937) stop-yarn.sh is not able to gracefully stop node managers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 28 Nov 2016 16:49:01 -0000 [ https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-5937: ------------------------------ Attachment: nm_shutdown.log > stop-yarn.sh is not able to gracefully stop node managers > --------------------------------------------------------- > > Key: YARN-5937 > URL: https://issues.apache.org/jira/browse/YARN-5937 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Attachments: nm_shutdown.log > > > stop-yarn.sh always gives following output > {code} > ./sbin/stop-yarn.sh > Stopping resourcemanager > Stopping nodemanagers > : WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9 > oracle1.fyre.ibm.com: ERROR: Unable to kill 18097 > {code} > this was because resource manager is stopped before node managers, when the shutdown hook manager tries to gracefully stop NM services, NM needs to unregister with RM, and it gets timeout as NM could not connect to RM (already stopped). See log (stop RM then run kill ) > {code} > 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM > ... > 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException > java.util.concurrent.TimeoutException > at java.util.concurrent.FutureTask.get(FutureTask.java:205) > at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67) > ... > at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291) > ... > 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown forcefully. > {code} > the shutdown hooker has a default of 10s timeout, so if RM is stopped before NMs, they always took more than 10s to stop (in java code). However stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped. > It would make sense to stop NMs before RMs in this script, in a graceful way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org