hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6401) terminating signal should be able to specify per application to support graceful-stop
Date Wed, 29 Mar 2017 18:28:42 GMT

    [ https://issues.apache.org/jira/browse/YARN-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947656#comment-15947656

Jason Lowe commented on YARN-6401:

Ah, sorry.  I was thinking it was ignoring SIGTERM and thus not cleaning up because it would
get killed by the subsequent SIGKILL.  Instead it sounds like it _is_ responding to SIGTERM
but not cleaning up.  Isn't that a bit odd?  The whole point of SIGTERM is to request a shutdown
of the process rather than forcing one.

I'm not an httpd expert, so I started digging into the docs to try to understand why it wouldn't
do something sane with TERM but does with a non-standard signal like WINCH.  Turns out it
does handle TERM, but it's aggressive such that in-progress requests may be interrupted/canceled.
 WINCH only advises things to exit, which sounds like active requests could continue to be
processed but the listen port is no longer monitored so no new requests will be processed.

What worries me here is that we can still end up with an unorderly shutdown even if YARN sent
WINCH instead of TERM. The default delay between the TERM and KILL signals is relatively short,
 which is why the processing httpd does for TERM seems more appropriate here.  If a request
could take hundreds of milliseconds to process then the KILL is going to arrive too soon after
the WINCH signal unless the delay between the two signals is widened.  However that delay
is not a per-app setting, and making it a per-app setting would cause a DoS problem.  Containers
are often killed because YARN needs the container to leave in a timely manner (e.g.: container
running beyond limits, preemption, etc.).

So I still think this is something better handled by the application framework (in this case
Slider) rather than YARN.  MapReduce has a similar example.  MapReduce jobs can be killed
via YARN, but it's harsh and things are often lost when this occurs.  That's why the {{mapred
job -kill}} command first tries to kill the job by contacting the AM and requesting it to
do an orderly shutdown outside of YARN, and only falls back on YARN to terminate the containers
if the job is unresponsive to the kill request.  I think the same thing applies here.  If
we really want an orderly shutdown to httpd so we won't kill outstanding requests (even if
they can take a while) then Slider (or some layer on top of Slider) should support sending
the WINCH signals to the containers for the app and then the app can terminate when all containers
have completed their shutdown.  Then the application can implement an arbitrary, application-specific
shutdown sequence and timing.  If YARN needs to do the killing directly then we cannot wait
an arbitrary amount of time for the app to cleanup and shutdown gracefully.

I think YARN will still need some support to send the WINCH signal in either case.  Currently
containers can be sent signals after YARN-1897, but it's only a restricted subset that can
be translated cross-platform.  That would need to be extended to support more arbitrary signals
like WINCH.

> terminating signal should be able to specify per application to support graceful-stop
> -------------------------------------------------------------------------------------
>                 Key: YARN-6401
>                 URL: https://issues.apache.org/jira/browse/YARN-6401
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: kyungwan nam
> when stop container, first send SIGTERM to the process.
> after a while, send SIGKILL if the process is still alive.
> above process is always the same for any application.
> but, to graceful-stop, sometimes it need to send another signal instead of SIGTERM.
> for instance, if apache httpd on slider is running, SIGWINCH should be came to stop gracefully.
> the way to stop gracefully is depend on application.
> it will be good if we can define a signal to terminate per application.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message