hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
Date Thu, 18 Aug 2016 22:38:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427301#comment-15427301

Jason Lowe commented on YARN-1529:

bq. One comment that I have is we are adding a new API, albeit a small one, for YARN application

That's a great point, and actually I'd be perfectly happy if this JIRA simply added the NM-level
metric source and skipped the container API part for now.  If we're moving towards doing this
via the ATS anyway, we may not want/need the env variable API.  It might be worth splitting
the patch so the less controversial NM-level metrics can go in earlier and we can discuss
the per-container metrics API in another.  If the consensus is that this patch should include
the per-container metrics API via the container env as well then I'm OK with that too.  I
also agree that hiding the implementation details of that API would be important, whether
that's in this JIRA or another.

Either way the patch needs an update, and please feel free to do so.

> Add Localization overhead metrics to NM
> ---------------------------------------
>                 Key: YARN-1529
>                 URL: https://issues.apache.org/jira/browse/YARN-1529
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Gera Shegalov
>            Assignee: Chris Trezzo
>         Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, YARN-1529.v04.patch
> Users are often unaware of localization cost that their jobs incur. To measure effectiveness
of localization caches it is necessary to expose the overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be fetched from
a central location, typically on HDFS, that results in a number of download requests for the
files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses.
> LocalizedFilesCached: total localization requests that were served from local caches.
Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served
out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from
ResourceRequestTransition to LocalizedTransition

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message