hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-499) On container failure, include last n lines of logs in diagnostics
Date Tue, 09 Apr 2013 01:15:14 GMT

    [ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626058#comment-13626058

Sandy Ryza commented on YARN-499:

Thanks for the feedback, Vinod.

The issue I am aiming to solve is the last one you mention of the AM crashing before registering
with the RM.  A few JIRAs have been filed around this problem with little progress, so I wanted
to put forth a concrete proposal.  I also wanted to bring back the stuff that gets printed
out on the console for failing tasks, and under my proposal, thought that this would fall
under the same issue of surfacing failed container logs.  If we decide on an approach that
only handles the former, I'll file a separate JIRA for the latter.

bq. The stuff that gets printed on console is the client pulling logs directly.
You're right, I misunderstood the code and thought it was coming through the task diagnostics.
 In light of that, I agree that my proposal isn't the right path.  Is there a reason we have
avoided pulling the logs directly in YARN as well?  If not, should we do this for both the
AM and task containers?

> On container failure, include last n lines of logs in diagnostics
> -----------------------------------------------------------------
>                 Key: YARN-499
>                 URL: https://issues.apache.org/jira/browse/YARN-499
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: YARN-499.patch
> When a container fails, the only way to diagnose it is to look at the logs.  ContainerStatuses
include a diagnostic string that is reported back to the resource manager by the node manager.
> Currently in MR2 I believe whatever is sent to the task's standard out is added to the
diagnostics string, but for MR standard out is redirected to a file called stdout.  In MR1,
this string was populated with the last few lines of the task's stdout file, and got printed
to the console, allowing for easy debugging.
> Handling this would help to soothe the infuriating problem of an AM dying for a mysterious
reason before setting a tracking URL (MAPREDUCE-3688).
> This could be done in one of two ways.
> * Use tee to send MR's standard out to both the stdout file and standard out.  This requires
modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the
entire task log in NM memory.
> * Read the task's log files.  This would require standardizing or making the container
log files configurable.  Right now the log files are determined in userland and all that is
YARN is aware of the log directory.
> Does this present any issues I'm not considering?  If so it this might only be needed
for AMs? 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message