hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hitesh Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
Date Tue, 24 Dec 2013 18:48:50 GMT

    [ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856421#comment-13856421
] 

Hitesh Shah commented on YARN-1529:
-----------------------------------

bq. I am preparing a patch that exposes this information MR counters for MRv2. Is there a
better way to achieve this in an application-agnostic manner such that it is visible in the
webUI.
Also, is there an MR jira for the per job stats? Furthermore, shouldn't the per application
implementation be such that all applications on YARN can leverage it as compared to just an
MR specific implementation. 

bq. Currently all resource types are lumped together. We can have a discussion whether it's
helpful to expose a finer break down at the NM level or the app-level.
Is there any comment/doc that describes the overall plan/approach that you are trying to implement?
I am not sure how these metrics translate into any actionable insights for a cluster admin
to act upon. 



> Add Localization overhead metrics to NM
> ---------------------------------------
>
>                 Key: YARN-1529
>                 URL: https://issues.apache.org/jira/browse/YARN-1529
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: YARN-1529.v01.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To measure effectiveness
of localization caches it is necessary to expose the overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be fetched from
a central location, typically on HDFS, that results in a number of download requests for the
files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses.
> LocalizedFilesCached: total localization requests that were served from local caches.
Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served
out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from
ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message