hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-445) Ability to signal containers
Date Sat, 15 Feb 2014 02:16:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902278#comment-13902278
] 

Ming Ma commented on YARN-445:
------------------------------

[Gera Shegalov|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jira.shegalov]
and I discussed the idea of providing such signal functionality at yarn layer without AM involved.
I have got the basic prototype working and would like get feedback from others.

The benefit of this approach is other yarn applications such as Spark don't need to write
any code to get the benefit of this feature. If we decide to extend the interface to support
jmap by allowing users to running any processing script onto the container in the future,
all yarn java applications will get it from free. Here how it works.

1. Client is able to ask RM to signal a specific container as long as it passes authorization.
{code:title=SignalContainerRequest.java|borderStyle=solid}
public interface SignalContainerRequest {
  /**
   * Get the <code>ContainerId</code> of the container to signal.
   * @return <code>ContainerId</code> of the container to signal.
   */
  @Public
  @Stable
  public abstract ContainerId getContainerId();
  
  @Private
  @Stable
  public abstract void setContainerId(ContainerId containerId);

  @Public
  @Stable
  public abstract int getSignal();

  @Private
  @Stable
  public abstract void setSignal(int signal);

}
{code}


{code:title=ClientRMProtocol.java|borderStyle=solid}


  /**
   * Signal a running container.
   *
   * @param request the container to signal.
   * @return an empty response.
   * @throws YarnRemoteException
   */
  public SignalContainerResponse signalContainer(
          SignalContainerRequest request)
          throws YarnRemoteException;

{code}

2. RM will provide the container id to the corresponding NM in the next heartbeat. HeartbeatResponse
interface is modified to provide such information.
3. AM isn't involved.
4. From customers point of view, on the CLI, customers use "bin/yarn application -signal $containerid
3" to capture jstack. On the web UI, customers can click on links on container web page as
well as MR job page

Of course, this is orthogonal to general signal support across different OS platforms.




> Ability to signal containers
> ----------------------------
>
>                 Key: YARN-445
>                 URL: https://issues.apache.org/jira/browse/YARN-445
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Jason Lowe
>            Assignee: Andrey Klochkov
>         Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT,
SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature implemented by
MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a
container.  For that specific feature we could implement it as an additional field in the
StopContainerRequest.  However that would not address other potential features like the ability
for an AM to trigger jstacks on arbitrary tasks *without* killing them.  The latter feature
would be a very useful debugging tool for users who do not have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message