hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3491) PublicLocalizer#addResource is too slow.
Date Fri, 17 Apr 2015 18:41:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500405#comment-14500405

zhihai xu commented on YARN-3491:

I uploaded a new patch YARN-3491.001.patch for review 
I think a little bit deeper, The old patch may have a big delay if multiple containers are
submitted at the same time.
For example the following log shows 4 containers submitted at very close time:
2015-04-07 21:42:22,071 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e30_1426628374875_110648_01_078264 transitioned from NEW to LOCALIZING
2015-04-07 21:42:22,074 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e30_1426628374875_110652_01_093777 transitioned from NEW to LOCALIZING
2015-04-07 21:42:22,076 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e30_1426628374875_110668_01_049049 transitioned from NEW to LOCALIZING
2015-04-07 21:42:22,078 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e30_1426628374875_110668_01_085183 transitioned from NEW to LOCALIZING
The new patch can overlap the delay with public localization from previous container, which
will be a little bit better and more consistent with the behavior in the old code.
Also It will be better for the container which only has private resource and no public resource.
For this case, no delay will be added to Dispatcher thread.
Finally the change in new patch is a little bit smaller than the first patch.

> PublicLocalizer#addResource is too slow.
> ----------------------------------------
>                 Key: YARN-3491
>                 URL: https://issues.apache.org/jira/browse/YARN-3491
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.7.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Critical
>         Attachments: YARN-3491.000.patch, YARN-3491.001.patch
> Based on the profiling, The bottleneck in PublicLocalizer#addResource is getInitializedLocalDirs.
getInitializedLocalDirs call checkLocalDir.
> checkLocalDir is very slow which takes about 10+ ms.
> The total delay will be approximately number of local dirs * 10+ ms.
> This delay will be added for each public resource localization.
> Because PublicLocalizer#addResource is slow, the thread pool can't be fully utilized.
Instead of doing public resource localization in parallel(multithreading), public resource
localization is serialized most of the time.
> And also PublicLocalizer#addResource is running in Dispatcher thread, 
> So the Dispatcher thread will be blocked by PublicLocalizer#addResource for long time.

This message was sent by Atlassian JIRA

View raw message