hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
Date Sat, 16 Aug 2014 20:48:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099792#comment-14099792

Eric Payne commented on YARN-415:

[~jianhe] and [~kkambatl]
Thank you both for your comments.

[~jianhe] wrote:
Because of this, for consistency, I think we better use getCurrentAttempt to charge finished
containers against current attempt also for work-presrving am restart?
If I understand correctly, is the suggestion that all finished containers be charged against
the current attempt? That would be tricky, since even in a normal use cases, an attempt can
be in the complete state before all of its containers are finished. Also, if the first attempt
dies after some of its containers are finished, then would the metrics for the finished containers
need to be transferred to the new attempt? I think that, since the metrics are reported at
the app level, charging the running containers to the current app until the containers finish
will be seemless to the end user. One thing that could be done is to have RMAppAttemptMetrics#getRMAppMetrics
 get a copy of the liveContainers and report only on the ones applicable to that attempt.
That seems like more overhead that may not be necessary.

[~kkambatl] wrote:
Just took a look at the patch. The major concern I have is the use of RMStateStore to store
app resource usage information. If we add more resources and more other statistics, storing
all of them to the RM state store could be placing too much overhead on the store, particularly
if it is ZKRMStateStore. Would it make more sense to store this information in the History/Timeline
Can you please help me to understand in more detail how this would be accomplished?

> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch,
YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch,
YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt,
YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt,
YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt,
YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.201408150030.txt, YARN-415.patch
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to get the memory
utilization of an application.  The unit should be MB-seconds or something similar and, from
a chargeback perspective, the memory amount should be the memory reserved for the application,
as even if the app didn't use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime
of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear
on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web Services REST API.

This message was sent by Atlassian JIRA

View raw message