hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Mitic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails
Date Mon, 07 Dec 2015 22:22:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045879#comment-15045879
] 

Ivan Mitic commented on YARN-4309:
----------------------------------

Thanks [~vvasudev], the proposal and the patch look good overall. I have a few mainly minor
comments below, mostly related to Windows side changes.

1. {code}
    public abstract void copyDebugInformation(Path src, Path dst)
        throws IOException;
    public abstract void listDebugInformation(Path output) throws IOException;
{code}
I think it would be helpful to document what the methods are supposed to do.

2. {code}
    @Override
    public void copyDebugInformation(Path src, Path dest)
        throws IOException {
      // no need to worry about permissions - in secure mode
      // WindowsSecureContainerExecutor will set permissions
      // to allow NM to read the file
      lineWithLenCheck(String.format("cp \"%s\" \"%s\"", src.toString(),
          dest.toString()));
      errorCheck();
    }
{code}
Do we want to remove the error check above, to be consistent with Linux, and to avoid failing
due to a "logging" failure?
Also, {{cp}} command does not exist on Windows. Please use {{copy}} instead. 

3. Why do you have both "dir" and "dir /AL /S" on Windows? Can you please include an inline
comment with rationale.

4. In copyDebugInformation() you are also doing a chmod() internally. Wondering this this
command should be injected by the call site given that only the caller has context on what
the destination is and whether special permission handling is needed. It might be possible
to change the method to only accept src and copy the file to the current folder, in which
case it might be fine to use chmod() given that there is an assumption on what the current
folder is. Just a thought you make the call.

> Add debug information to application logs when a container fails
> ----------------------------------------------------------------
>
>                 Key: YARN-4309
>                 URL: https://issues.apache.org/jira/browse/YARN-4309
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: YARN-4309.001.patch, YARN-4309.002.patch, YARN-4309.003.patch, YARN-4309.004.patch,
YARN-4309.005.patch, YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it failed.
> My proposal is that if a container fails, we collect information about the container
local dir and dump it into the container log dir. Ideally, I'd like to tar up the directory
entirely, but I'm not sure of the security and space implications of such a approach. At the
very least, we can list all the files in the container local dir, and dump the contents of
launch_container.sh(into the container log dir).
> When log aggregation occurs, all this information will automatically get collected and
make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message