hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Trezzo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
Date Thu, 18 Aug 2016 21:51:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427222#comment-15427222

Chris Trezzo commented on YARN-1529:

Thanks [~jlowe] for the rebased patch! I agree that it would be nice to not tie these localization
metrics to ATS so that more people can leverage them earlier.

One comment that I have is we are adding a new API, albeit a small one, for YARN application
developers. This API is the serialized data we put into the environment variable (LOCALIZATION_COUNTERS)
to communicate the localization statistics to the application-level container. Currently,
if a YARN developer wants to leverage these metrics, they have to figure out how information
is serialized into this env var and hope it doesn't change. What do you think about adding
a small class/method that defines this a little more formally and contains the deserialization
logic? That way if another application, let's say TEZ, wants to leverage this data, they can
just call the new deserialize method.

If you think this is a good idea, I can post another patch with the added class. Thanks!

> Add Localization overhead metrics to NM
> ---------------------------------------
>                 Key: YARN-1529
>                 URL: https://issues.apache.org/jira/browse/YARN-1529
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Gera Shegalov
>            Assignee: Chris Trezzo
>         Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, YARN-1529.v04.patch
> Users are often unaware of localization cost that their jobs incur. To measure effectiveness
of localization caches it is necessary to expose the overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be fetched from
a central location, typically on HDFS, that results in a number of download requests for the
files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses.
> LocalizedFilesCached: total localization requests that were served from local caches.
Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served
out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from
ResourceRequestTransition to LocalizedTransition

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message