hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-445) Ability to signal containers
Date Fri, 07 Mar 2014 20:58:50 GMT

     [ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ming Ma updated YARN-445:

    Attachment: MRTasks.png

Thanks Xuan. Let us collaborate on this.

1. Isn't KillContainer just a special case for SignalContainer?
2. I have got the general signalContainer approach working based on older version of hadoop
2. Porting to trunk require some effort.

I will create these subtasks.

a) Signal container support within NM's ContainerManager and Launcher. That includes defining
a new container launch event type ContainersLauncherEventType and the processing of the event,
regardless where the event come from.

b) Signal container request delivery from YarnClientImpl -> RM -> NM. This includes
the relevant RPC changes as well as changes in RM and NM to get the request delivered successfully
from client to NM's ContainerManager.

c) YARN web UI update so that people can get thread dump directly from the web UI.

d) YARN CLI update so that people can send signal to a specific container.

e) MAPREDUCE web UI update so that people can get thread dump directly from the web UI.

f) MAPREDUCE CLI update so that people can send signal to a specific task.

I have attached some webUI screenshots.

>From CLI people can use yarn command as well as hadoop command.

yarn application
-kill <Application ID [reason]> Kills the application.
-list Lists all the Applications from RM.
-signal <container ID [signal number]> Signal the container. Default signal number is
-status <arg> Prints the status of the application.

hadoop job
Usage: CLI <command> <args>
[-submit <job-file>]
[-status <job-id>]
[-counter <job-id> <group-name> <counter-name>]
[-kill <job-id>]
[-set-priority <job-id> <priority>]. Valid values for priorities are: VERY_HIGH
-events <job-id> <from-event-#> <#-of-events>
[-history <jobHistoryFile>]
[-list [all]]
[-list-attempt-ids <job-id> <task-type> <task-state>]. Valid values for
<task-type> are MAP REDUCE JOB_SETUP JOB_CLEANUP TASK_CLEANUP. Valid values for <task-state>
are running, completed
[-kill-task <task-attempt-id> [reason]]
[-fail-task <task-attempt-id> [reason]]
[-signal-task <task-attempt-id> [signal number]]
[-logs <job-id> <task-attempt-id>]


> Ability to signal containers
> ----------------------------
>                 Key: YARN-445
>                 URL: https://issues.apache.org/jira/browse/YARN-445
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Jason Lowe
>            Assignee: Andrey Klochkov
>         Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, YARN-445--n3.patch,
YARN-445--n4.patch, YARN-445.patch, YARNContainers.png
> It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT,
SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature implemented by
MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a
container.  For that specific feature we could implement it as an additional field in the
StopContainerRequest.  However that would not address other potential features like the ability
for an AM to trigger jstacks on arbitrary tasks *without* killing them.  The latter feature
would be a very useful debugging tool for users who do not have shell access to the nodes.

This message was sent by Atlassian JIRA

View raw message