hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).
Date Wed, 15 Apr 2015 18:59:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496702#comment-14496702
] 

zhihai xu commented on YARN-3491:
---------------------------------

I saw the serialization for public resource localization in the following logs:
The following log shows two private localization requests and many public localization requests
from container_e30_1426628374875_110892_01_000475
{code}
2015-04-07 22:49:56,750 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e30_1426628374875_110892_01_000475 transitioned from NEW to LOCALIZING
2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://nameservice1/user/databot/.staging/job_1426628374875_110892/job.xml transitioned
from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://nameservice1/user/databot/.staging/job_1426628374875_110892/job.jar transitioned
from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar transitioned
from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar transitioned
from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://nameservice1/tmp/temp1444482237/tmp1631960573/service-local-search-sdk.jar
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://nameservice1/tmp/temp1444482237/tmp-1521315530/ace-geo.jar transitioned from
INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://nameservice1/tmp/temp1444482237/tmp1347512155/cortex-server.jar transitioned
from INIT to DOWNLOADING
{code}

The following log shows how the public resource localizations are processed.
{code}
2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Created localizer for container_e30_1426628374875_110892_01_000475

2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar,
1428446867531, FILE, null }

2015-04-07 22:49:56,882 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar,
1428446864128, FILE, null }

2015-04-07 22:49:56,902 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar(->/data2/yarn/nm/filecache/4877652/reflections.jar)
transitioned from DOWNLOADING to LOCALIZED

2015-04-07 22:49:57,127 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp1631960573/service-local-search-sdk.jar,
1428446858408, FILE, null }

2015-04-07 22:49:57,145 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar(->/data11/yarn/nm/filecache/4877653/service-media-sdk.jar)
transitioned from DOWNLOADING to LOCALIZED

2015-04-07 22:49:57,251 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-1521315530/ace-geo.jar,
1428446862857, FILE, null }

2015-04-07 22:49:57,270 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://nameservice1/tmp/temp1444482237/tmp1631960573/service-local-search-sdk.jar(->/data1/yarn/nm/filecache/4877654/service-local-search-sdk.jar)
transitioned from DOWNLOADING to LOCALIZED

2015-04-07 22:49:57,383 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp1347512155/cortex-server.jar,
1428446857069, FILE, null }
{code}

Based on the log, You can see the thread pools are not fully used, only one thread is used.
The default thread pool size is 4,
"Downloading public rsrc" is printed from Dispatcher thread.
"transitioned from DOWNLOADING to LOCALIZED" is printed from PublicLocalizer thread.
You can see these two messages are interleaved,
"Downloading public rsrc" 
"transitioned from DOWNLOADING to LOCALIZED"
"Downloading public rsrc" 
"transitioned from DOWNLOADING to LOCALIZED"
"Downloading public rsrc" 
"transitioned from DOWNLOADING to LOCALIZED"

Also when you compare the time to process the localization event between public resource and
private resource in Dispatcher thread,
there is a huge difference:
The time to process two localization event for private resource  in Dispatcher thread is less
than one millisecond.
based on the following log:
{code}
2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Created localizer for container_e30_1426628374875_110892_01_000475
2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar,
1428446867531, FILE, null }
{code}

The time to process one localization event for public resource in Dispatcher thread is 124
millisecond
based on the following log:The 
{code}
2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-1316042064/reflections.jar,
1428446867531, FILE, null }
2015-04-07 22:49:56,882 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{ hdfs://nameservice1/tmp/temp1444482237/tmp-327542609/service-media-sdk.jar,
1428446864128, FILE, null }
{code}

The following is the code which process localization event in Dispatcher thread:
{code}
    public void handle(LocalizerEvent event) {
      String locId = event.getLocalizerId();
      switch (event.getType()) {
      case REQUEST_RESOURCE_LOCALIZATION:
        // 0) find running localizer or start new thread
        LocalizerResourceRequestEvent req =
          (LocalizerResourceRequestEvent)event;
        switch (req.getVisibility()) {
        case PUBLIC:
          publicLocalizer.addResource(req);
          break;
        case PRIVATE:
        case APPLICATION:
          synchronized (privLocalizers) {
            LocalizerRunner localizer = privLocalizers.get(locId);
            if (null == localizer) {
              LOG.info("Created localizer for " + locId);
              localizer = new LocalizerRunner(req.getContext(), locId);
              privLocalizers.put(locId, localizer);
              localizer.start();
            }
            // 1) propagate event
            localizer.addResource(req);
          }
          break;
        }
        break;
      }
    }
{code}



> Improve the public resource localization to do both FSDownload submission to the thread
pool and completed localization handling in one thread (PublicLocalizer).
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3491
>                 URL: https://issues.apache.org/jira/browse/YARN-3491
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.7.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Critical
>
> Improve the public resource localization to do both FSDownload submission to the thread
pool and completed localization handling in one thread (PublicLocalizer).
> Currently FSDownload submission to the thread pool is done in PublicLocalizer#addResource
which is running in Dispatcher thread and completed localization handling is done in PublicLocalizer#run
which is running in PublicLocalizer thread.
> Because FSDownload submission to the thread pool at the following code is time consuming,
the thread pool can't be fully utilized. Instead of doing public resource localization in
parallel(multithreading), public resource localization is serialized most of the time.
> {code}
>             synchronized (pending) {
>               pending.put(queue.submit(new FSDownload(lfs, null, conf,
>                   publicDirDestPath, resource, request.getContext().getStatCache())),
>                   request);
>             }
> {code}
> Also there are two more benefits with this change:
> 1. The Dispatcher thread won't be blocked by above FSDownload submission. Dispatcher
thread handles most of time critical events at Node manager.
> 2. don't need synchronization on HashMap (pending).
> Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message