hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Klochkov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-445) Ability to signal containers
Date Tue, 15 Oct 2013 22:50:43 GMT

    [ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795753#comment-13795753

Andrey Klochkov commented on YARN-445:

Accepting a mapping of arbitrary commands is indeed the most powerful approach. Also, this
would require lots of changes in the Yarn, as well as an additional complexity for app writers.
At the same time, are we sure that this flexibility is needed, and it won't be an over-engineering
and probably an abstraction leak in the Yarn framework? By the latter I mean that we will
give app writers an ability to run arbitrary commands on any node at any point of time, but
is it in the Yarn responsibilities to do that? I'm not a Yarn expert so I'm just asking.

Anyway, the scope of what I has proposed with the patch is much smaller and solves the task
the initial description of this Jira stated - troubleshooting of timed out containers by dumping
jstack. This would be useful for many Yarn uses, so I thought it may make sense to implement
it this way now and extend in the future if there is a demand. Agree that the way it is exposed
in the API may be changed to a signal value in the stopContainers request instead of a separate
call which is indeed a bit confusing.

> Ability to signal containers
> ----------------------------
>                 Key: YARN-445
>                 URL: https://issues.apache.org/jira/browse/YARN-445
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Jason Lowe
>            Assignee: Andrey Klochkov
>         Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch
> It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT,
SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature implemented by
MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a
container.  For that specific feature we could implement it as an additional field in the
StopContainerRequest.  However that would not address other potential features like the ability
for an AM to trigger jstacks on arbitrary tasks *without* killing them.  The latter feature
would be a very useful debugging tool for users who do not have shell access to the nodes.

This message was sent by Atlassian JIRA

View raw message