hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7261) Add debug message in class FSDownload for better download latency monitoring
Date Fri, 20 Oct 2017 03:12:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212095#comment-16212095
] 

Xiao Chen commented on YARN-7261:
---------------------------------

Thanks [~yufeigu] for creating the jira and providing a patch.

For context, Yufei and myself have seen an intermittent issue where localization took very
long. It is suspected that the copying from hdfs took long, but HDFS metrics/logs doesn't
show any smoking guns. We'd like to use this jira to add more debugging information.

The log we collected currently looks like:
{noformat}
2017-09-15 10:55:50,738 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Created localizer for container_e70_1505214525894_75227_01_000014
2017-09-15 10:55:50,738 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{ hdfs://nameservice1/cached/pub/deviceDetailsQuery_1505472717000.xml,
1505472808731, FILE, null }
...
2017-09-15 10:58:38,760 DEBUG org.apache.hadoop.yarn.util.FSDownload: Changing permissions
for path file:/var/hdfs/5/yarn/nm/filecache/7363_tmp/deviceDetailsQuery_1505472717000.xml
to perm r-xr-xr-x
2017-09-15 10:58:38,775 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e70_1505214525894_75227_01_000014 transitioned from LOCALIZING to LOCALIZED
{noformat}
But no details on what happened in the 3 minutes.

The patch LGTM. 1 question:
Do you think adding a debug message to {{ResourceLocalizationService#addResource}}, to indicate
the when the following 1 & 2 conditions are false would be helpful?
{code}
      /*
       * Here multiple containers may request the same resource. So we need
       * to start downloading only when
       * 1) ResourceState == DOWNLOADING
       * 2) We are able to acquire non blocking semaphore lock.
       * If not we will skip this resource as either it is getting downloaded
       * or it FAILED / LOCALIZED.
       */
{code}

> Add debug message in class FSDownload for better download latency monitoring
> ----------------------------------------------------------------------------
>
>                 Key: YARN-7261
>                 URL: https://issues.apache.org/jira/browse/YARN-7261
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>         Attachments: YARN-7261.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message