hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sidharta Seethana (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails
Date Wed, 02 Dec 2015 02:08:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035122#comment-15035122
] 

Sidharta Seethana commented on YARN-4309:
-----------------------------------------

[~vvasudev] , I took a look at the patch. Couple of comments : 

* Could you clarify why the debugging information gathering in DockerContainerExecutor.writeLaunchEnv
is not guarded by a config check? The new test you added uses DefaultContainerExecutor so
it looks like this was missed. 
* There seem to be minor inconsistent line spacing issues in the new test function in TestContainerLaunch.java


Apart from these, assuming it is safe to list user directory contents (as already discussed
on this JIRA), the patch seems good to me.  Thanks for this patch - I expect the launch_container.sh
copy to be particularly useful for debugging purposes.

> Add debug information to application logs when a container fails
> ----------------------------------------------------------------
>
>                 Key: YARN-4309
>                 URL: https://issues.apache.org/jira/browse/YARN-4309
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: YARN-4309.001.patch, YARN-4309.002.patch, YARN-4309.003.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it failed.
> My proposal is that if a container fails, we collect information about the container
local dir and dump it into the container log dir. Ideally, I'd like to tar up the directory
entirely, but I'm not sure of the security and space implications of such a approach. At the
very least, we can list all the files in the container local dir, and dump the contents of
launch_container.sh(into the container log dir).
> When log aggregation occurs, all this information will automatically get collected and
make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message