hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5937) stop-yarn.sh is not able to gracefully stop node managers
Date Wed, 04 Jan 2017 08:39:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797609#comment-15797609
] 

Naganarasimha G R commented on YARN-5937:
-----------------------------------------

Sorry for the delayed reply, Actually i was looking out for normal case also NM was not shutting
down gracefully. Offlate i have not tested trunk code. Let me test if its there then we can
fix both issues together. Existing solution seems fine to me !


> stop-yarn.sh is not able to gracefully stop node managers
> ---------------------------------------------------------
>
>                 Key: YARN-5937
>                 URL: https://issues.apache.org/jira/browse/YARN-5937
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>              Labels: script
>         Attachments: YARN-5937.01.patch, nm_shutdown.log
>
>
> stop-yarn.sh always gives following output
> {code}
> ./sbin/stop-yarn.sh
> Stopping resourcemanager
> Stopping nodemanagers
> <NM_HOST>: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying
to kill with kill -9
> <NM_HOST>: ERROR: Unable to kill 18097
> {code}
> this was because resource manager is stopped before node managers, when the shutdown
hook manager tries to gracefully stop NM services, NM needs to unregister with RM, and it
gets timeout as NM could not connect to RM (already stopped). See log (stop RM then run kill
<nm_pid>)
> {code}
> 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
> ...
> 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook 'CompositeServiceShutdownHook'
timeout, java.util.concurrent.TimeoutException
> java.util.concurrent.TimeoutException
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
> 	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
> ...
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291)
> ...
> 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown forcefully.
> {code}
> the shutdown hooker has a default of 10s timeout, so if RM is stopped before NMs, they
always took more than 10s to stop (in java code). However stop-yarn.sh only gives 5s timeout,
so NM is always killed instead of stopped.
> It would make sense to stop NMs before RMs in this script, in a graceful way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message