hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-162) nodemanager log aggregation has scaling issues with namenode
Date Mon, 19 Nov 2012 15:18:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500300#comment-13500300
] 

Robert Joseph Evans commented on YARN-162:
------------------------------------------

Sid I like the patch.  I have a few minor comments

# there are a few TODOs added into the code.  {code}// TODO This is broken. Container ID for
the AM may not be 1.{code}, {code}// TODO Should the app really fail if log aggregation fails
?{code} and {code}// TODO Send out an event to the app. Currently since aggregation failure{code}.
 I could not find an existing JIRA for the first one so please file one for it.  The other
two seem to be related to one another.  If you feel strongly that we should not fail an application
because log aggregation will not work, then please file a separate JIRA for that, otherwise
the TODOs should just be comments and not TODOs.
# I don't really like the name of the new config that was added.  It exposes the internal
implementation of how we throttle the applications.  I would prefer to have it called something
like yarn.nodemanager.log-aggregation.max-concurrent-apps.  But this is very minor.
# The new config was not added to yarn-default.xml
# This is also very minor. Inside LogAggregationService.stopApp we are wrapping a Void callable
inside another Void callable.  I would prefer it if we returned the original value instead
of returning null.

With Jenkin's +1 I am OK with the change, but it is a large enough change that I am a bit
nervous about pulling this into 0.23.5.  If you are OK with this, I will pull in a modified
YARN-219 that addresses your comments, and then we can pull this into trunk, branch-2, and
branch-0.23 (0.23.6)
                
> nodemanager log aggregation has scaling issues with namenode
> ------------------------------------------------------------
>
>                 Key: YARN-162
>                 URL: https://issues.apache.org/jira/browse/YARN-162
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.3
>            Reporter: Nathan Roberts
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: YARN-162.txt, YARN-162_WIP.txt
>
>
> Log aggregation causes fd explosion on the namenode. On large clusters this can exhaust
FDs to the point where datanodes can't check-in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message