hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gera Shegalov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
Date Tue, 24 Dec 2013 19:57:51 GMT

    [ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856439#comment-13856439

Gera Shegalov commented on YARN-1529:

bq. Also, is there an MR jira for the per job stats? 

I linked MAPREDUCE-5696 to this JIRA. 

bq. Furthermore, shouldn't the per application implementation be such that all applications
on YARN can leverage it as compared to just an MR specific implementation.

Ideally, yes. As stated in the previous comment, open to suggestions. As of now there seems
to be no common application metrics. I expose localization cost as an environment variable
(LOCALIZATION_COUNTERS) in MAPREDUCE-5696 to containers.  MR containers add them as TaskCounter.
We can also include it in MRAppMetrics. Other applications can use this variable in some other

bq. Is there any comment/doc that describes the overall plan/approach that you are trying
to implement?

The background is in YARN-1492

bq.  I am not sure how these metrics translate into any actionable insights for a cluster
admin to act upon.

Users will see how localization overhead (shipping computation to data) compares to their
container execution times. It should help reconsider build/packaging strategies encourage
making better use of DistributedCache, etc. Admins will be able to better dissect network
utilization in the cluster.  Our particular goal is to clearly demo usefulness of YARN-1492.

> Add Localization overhead metrics to NM
> ---------------------------------------
>                 Key: YARN-1529
>                 URL: https://issues.apache.org/jira/browse/YARN-1529
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: YARN-1529.v01.patch
> Users are often unaware of localization cost that their jobs incur. To measure effectiveness
of localization caches it is necessary to expose the overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be fetched from
a central location, typically on HDFS, that results in a number of download requests for the
files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses.
> LocalizedFilesCached: total localization requests that were served from local caches.
Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served
out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from
ResourceRequestTransition to LocalizedTransition

This message was sent by Atlassian JIRA

View raw message