hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.
Date Sun, 06 Aug 2017 08:41:03 GMT

     [ https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Konstantin Shvachko updated YARN-6728:
    Fix Version/s:     (was: 2.7.4)
                       (was: 2.9.0)

> Job will run slow when the performance of defaultFs degrades and the log-aggregation
is enable. 
> ------------------------------------------------------------------------------------------------
>                 Key: YARN-6728
>                 URL: https://issues.apache.org/jira/browse/YARN-6728
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, yarn
>    Affects Versions: 2.7.1
>         Environment: CentOS 7.1 hadoop-2.7.1
>            Reporter: zhengchenyu
>         Attachments: YARN-6728.patch.00_branch-2.7
>   Original Estimate: 1m
>  Remaining Estimate: 1m
> In our cluster, I found many map keep "NEW" state  for several minutes. Here I got the
container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java
304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_000011 to
application application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java
1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_000011
transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of  AsyncDispather
run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure
of defaultFs increase.  (Note: log-aggregation is enable. )
> Container runs in nodemanager will invoke initApp(), then invoke verifyAndCreateRemoteLogDir
and mkdir remote log, these operation will visit the defaultFs. So the container will be stuck
here. Then application will run slow.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message