hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-445) Ability to signal containers
Date Mon, 15 Apr 2013 18:54:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632028#comment-13632028
] 

Chris Nauroth commented on YARN-445:
------------------------------------

Unfortunately, I don't believe the Unix signal concept maps cleanly to Windows.  Some of the
signal-related functions are defined on Windows, but with behavior quite different from the
Unix equivalent.

http://msdn.microsoft.com/en-us/library/xdkz3x12(v=vs.71).aspx

For example, there are differences in exit codes seen by the signalled process, and some signal
handling scenarios cause the process to start a new thread to handle it instead of interrupting
an existing thread.

Another alternative on Windows is console control handlers:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686016(v=vs.85).aspx

I have seen projects that attempt to define a higher-level interface of "externally triggered
command", using method names like gracefulShutdown, kill, and outputDebugInfo.  On a Unix,
the implementation can map these to signal/kill.  On Windows, the implementation can map these
to SetConsoleCtrlHandler/GenerateConsoleCtrlEvent.  The problem is that this is a least common
denominator approach that may not cover all possible use cases.

Considering all of that, I can think of 3 different approaches to this feature:

# Sacrifice trying to create a general-purpose signaling mechanism and just stay focused on
triggering JVM features.  (This is identical to Jason's #1.)
# Use the Windows APIs I mentioned above to implement least-common-denominator signaling support.
# Add YARN API support for ContainerLaunchContext to accept a mapping of externally-triggered
command names to code.  (i.e. {{ctx.setExternalCommand("gracefulShutdown", "kill -TERM $CONTAINER_PID")}}.
 Then, during execution, the AM could send a message to the NM saying "gracefulShutdown container_X".
 When the NM receives the message, it could look up "gracefulShutdown" in the map of external
commands and trigger the kill.  For highly custom message handling scenarios (Windows console
control events/named pipes/whatever else), the AM could ship a binary as a localized resource
that contains the implementation, and the external command can be mapped to call that binary.

Each of these approaches gets progressively more general-purpose, but also progressively more
complex.  The last one in particular gives maximum flexibility, but makes the API challenging
for AM writers.

A side note on the last option: another variant is to add one more level of indirection in
the API to support different container launch configuration per platform.  This would make
it easier to support heterogeneous clusters (mix of Unix and Windows nodes).  This would let
the AM say things like "use kill on Unix, but use something else on Windows" but without needing
to know if specific nodes are running Unix or Windows.

                
> Ability to signal containers
> ----------------------------
>
>                 Key: YARN-445
>                 URL: https://issues.apache.org/jira/browse/YARN-445
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.0.5-beta
>            Reporter: Jason Lowe
>
> It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT,
SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature implemented by
MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a
container.  For that specific feature we could implement it as an additional field in the
StopContainerRequest.  However that would not address other potential features like the ability
for an AM to trigger jstacks on arbitrary tasks *without* killing them.  The latter feature
would be a very useful debugging tool for users who do not have shell access to the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message