hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8353) hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop
Date Sat, 12 May 2012 14:04:48 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274004#comment-13274004

Hudson commented on HADOOP-8353:

Integrated in Hadoop-Mapreduce-trunk #1077 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1077/])
    HADOOP-8353. hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed
by Roman Shaposhnik. (Revision 1337251)

     Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh
* /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh

> hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop
> -------------------------------------------------------------
>                 Key: HADOOP-8353
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8353
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.23.1
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>             Fix For: 2.0.0
>         Attachments: HADOOP-8353-2.patch.txt, HADOOP-8353.patch.txt
> The way that stop actions is implemented is a simple SIGTERM sent to the JVM. There's
a time delay between when the action is called and when the process actually exists. This
can be misleading to the callers of the *-daemon.sh scripts since they expect stop action
to return when process is actually stopped.
> I suggest we augment the stop action with a time-delay check for the process status and
a SIGKILL once the delay has expired.
> I understand that sending SIGKILL is a measure of last resort and is generally frowned
upon among init.d script writers, but the excuse we have for Hadoop is that it is engineered
to be a fault tolerant system and thus there's not danger of putting system into an incontinent
state by a violent SIGKILL. Of course, the time delay will be long enough to make SIGKILL
event a rare condition.
> Finally, there's always an option of an exponential back-off type of solution if we decide
that SIGKILL timeout is short.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message