hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
Date Mon, 18 Aug 2014 21:29:22 GMT

    [ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101341#comment-14101341

Eric Payne commented on YARN-415:

[~kkambatl], thank you for taking the time to review this patch.

I would like to see if [~kthrapp] could comment on your use case questions, but here are my
initial thoughts:

1. Is the chargeback simply to track the usage and may be financially charge the users. Or,
is to influence future scheduling decisions? I agree that the RM should facilitate collecting
this information, but should the collected info be available to the RM for future use? If
not, do we want the RM to serve this info?
Potential goals could be: 
# report (and charge for) grid usage
# eventually limit job submission based on a users' budget
2. Do we want to charge the app only for the resources used to do meaningful work or do we
also want to include failed/preempted containers? If we don't charge the app for failed containers,
who are they charged to? Are we okay with letting some resources go uncharged?
This implementation does charge the app for failed containers. This was debated somewhat previously
in this JIRA, because if the failure was due to preemption or a bug that wasn't the app's
"fault," it may be unfair to charge the app for those. However, it is very unclear how one
could programmatically determine whose "fault" the failure is.

3. How soon do we want this usage information? It might make sense to collect/expose this
once the app is finished for certain kinds of applications. What is our story for long-running
There is a specific use case for determine the usage at runtime. Again, I would hope that
[~kthrapp] could elaborate on this.

> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch,
YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch,
YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt,
YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt,
YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt,
YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.201408150030.txt, YARN-415.201408181938.txt,
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to get the memory
utilization of an application.  The unit should be MB-seconds or something similar and, from
a chargeback perspective, the memory amount should be the memory reserved for the application,
as even if the app didn't use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime
of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear
on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web Services REST API.

This message was sent by Atlassian JIRA

View raw message