hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Klochkov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-445) Ability to signal containers
Date Fri, 04 Oct 2013 20:17:44 GMT

    [ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786572#comment-13786572

Andrey Klochkov commented on YARN-445:

Steve, the current implementation will send the signal to the java started with bin/hbase
as it sends it to all processes in the job object, e.g. all processes of the main container
process. It can be replaced with sending the signal to all processes in the group instead,
and I think the behavior will be the same. 

BTW I don't know how to do the opposite - i.e. how to avoid sending the signal to all processes
of the container, on Windows (so the behavior on Linux is different as "bin/hbase" will receive
the signal). I think this is fine as long as this difference is documented. In case of hbase
the shell script can create a custom hook for SIGTERM and do whatever is needed in that case
(e.g. send SIGTERM to the java process it started). 

There is one caveat in ctrl+break handling in case of a batch file starting a java process:
1. the batch file starts the java process
2. user sends ctrl+break to all processes in the group (or job object). java process prints
thread dump. batch file doesn't react yet.
3. the java processes completes successfully
4. the batch file will not exit, it will print "Terminate batch job? (Y/N)" as it received
the ctrl+break signal earlier.

The only way I see on how we can overcome this problem with batch file processes is to identify
them somehow (by executable name?) when walking through the processes in the job object, and
do not send them the signal. Sending ctrl+break to batch file processes doesn't make sense
anyway as in newer Windows there's no way to disable or customize ctrl+break handling in batch

> Ability to signal containers
> ----------------------------
>                 Key: YARN-445
>                 URL: https://issues.apache.org/jira/browse/YARN-445
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Jason Lowe
>         Attachments: YARN-445--n2.patch, YARN-445.patch
> It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT,
SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature implemented by
MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a
container.  For that specific feature we could implement it as an additional field in the
StopContainerRequest.  However that would not address other potential features like the ability
for an AM to trigger jstacks on arbitrary tasks *without* killing them.  The latter feature
would be a very useful debugging tool for users who do not have shell access to the nodes.

This message was sent by Atlassian JIRA

View raw message