hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
Date Sat, 15 Feb 2014 02:51:22 GMT

    [ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902291#comment-13902291

Ming Ma commented on YARN-221:

[Chris Trezzo|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ctrezzo] and [Gera
Shegalov|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jira.shegalov] and I
discussed more on this. We would like to give some updates and get feedback from others. Similar
to what Robert suggested originally, we need to provide a way for AM to update the log aggregation
policy when it stops the container.

One likely log aggregation policy for MRAppMaster is to log all failed tasks and sample logs
of some successful tasks. What we found is container exitcode isn't a reliable indication
whether a MR task finishes successfully. That is due to the fact MRAppMaster calls stopContainer
while the YarnChild JVM exits by itself. Depending on the timing, you might get non-zero exitcode
for successful tasks. So specifying the log aggregation policy up front during ContainerLaunchContext
isn't enough.

The mechanism for AM to pass log aggregation policy to YARN needs to address different scenarios.

1. Containers exit by themselves. DistributedShell belongs to this category.
2. AM has to explicitly stop the containers. MR belongs to this category.
3. AM might want to inform NM to do on-demand log aggregation without stopping the container.
This might be useful for some long running applications.

To support #1, we have to specify the log aggregation policy as part of startContainer call.
Chris' patch handles that.

To support #2, AM has to indicate to NM whether the log aggregation is needed during stopContainer
call. AM can uses different types of policies such as successful tasks sampling. For that,
AM will specify the log aggregation policy as part of StopContainerRequest.



   * Get the <code>ContainerLogAggregationPolicy</code> for the container.
   * @return The <code>ContainerLogAggregationPolicy</code> for the container.
  public ContainerLogAggregationPolicy getLogAggregationPolicy();

   * Set the <code>ContainerLogAggregationPolicy</code> for the container.
   * @param policy The <code>ContainerLogAggregationPolicy</code> for the container.
  public void setLogAggregationPolicy(ContainerLogAggregationPolicy policy);

Alternatively we can define a new interface called ContainerStopContext to capture log aggregation
policy and other information we want to include later, etc.


  public abstract ContainerStopContext getContainerStopContext();

  public abstract void setContainerStopContext(ContainerStopContext context);


To support #3, we need some new API such as updateContainer so that AM can ask NM to roll
container log and update the log aggregation policy, etc.

> NM should provide a way for AM to tell it not to aggregate logs.
> ----------------------------------------------------------------
>                 Key: YARN-221
>                 URL: https://issues.apache.org/jira/browse/YARN-221
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Robert Joseph Evans
>            Assignee: Chris Trezzo
>         Attachments: YARN-221-trunk-v1.patch
> The NodeManager should provide a way for an AM to tell it that either the logs should
not be aggregated, that they should be aggregated with a high priority, or that they should
be aggregated but with a lower priority.  The AM should be able to do this in the ContainerLaunch
context to provide a default value, but should also be able to update the value when the container
is released.
> This would allow for the NM to not aggregate logs in some cases, and avoid connection
to the NN at all.

This message was sent by Atlassian JIRA

View raw message