hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM
Date Thu, 11 Aug 2016 21:48:20 GMT

     [ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated YARN-1529:
    Attachment: YARN-1529.v04.patch

I've attached a version 4 of the patch upmerged to trunk which is what we're using internally.
 It's heavily derived from Gera's patch.

I agree that writing the metrics to ATS would be interesting and useful, but I'm not sure
we should tie NM-level localization metrics and container-level metrics together in one JIRA.
 We've found the node-level aggregated metrics very useful on their own.  As such I'm thinking
we might want to proceed in this JIRA with the aggregated container localization metrics in
the NM and move the per-container metrics in ATS to a separate JIRA.  That way we can get
some of the benefits sooner (and on clusters that don't have ATS configured or prepared to
handle the extra load).

> Add Localization overhead metrics to NM
> ---------------------------------------
>                 Key: YARN-1529
>                 URL: https://issues.apache.org/jira/browse/YARN-1529
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Gera Shegalov
>            Assignee: Chris Trezzo
>         Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, YARN-1529.v04.patch
> Users are often unaware of localization cost that their jobs incur. To measure effectiveness
of localization caches it is necessary to expose the overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be fetched from
a central location, typically on HDFS, that results in a number of download requests for the
files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses.
> LocalizedFilesCached: total localization requests that were served from local caches.
Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served
out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from
ResourceRequestTransition to LocalizedTransition

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message