hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-162) nodemanager log aggregation has scaling issues with namenode
Date Mon, 19 Nov 2012 16:37:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500352#comment-13500352

Siddharth Seth commented on YARN-162:

Thanks for the review.

bq. With Jenkin's +1 I am OK with the change, but it is a large enough change that I am a
bit nervous about pulling this into 0.23.5. If you are OK with this, I will pull in a modified
YARN-219 that addresses your comments, and then we can pull this into trunk, branch-2, and
branch-0.23 (0.23.6)
Fair enough. I'll address the review comments and post another patch.

bq. The other two seem to be related to one another. If you feel strongly that we should not
fail an application because log aggregation will not work, then please file a separate JIRA
for that, otherwise the TODOs should just be comments and not TODOs.
Without this patch, I believe log aggregation will ignore errors in aggregating logs for individual
containers. It'll pass as long as the app directory can be created. The patch changes things
to allow dir creation to fail as well. If a user asks for log-aggregation, and any part of
it fails - should the app fail ? IAC, will create another jira.
> nodemanager log aggregation has scaling issues with namenode
> ------------------------------------------------------------
>                 Key: YARN-162
>                 URL: https://issues.apache.org/jira/browse/YARN-162
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.3
>            Reporter: Nathan Roberts
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: YARN-162.txt, YARN-162_WIP.txt
> Log aggregation causes fd explosion on the namenode. On large clusters this can exhaust
FDs to the point where datanodes can't check-in.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message