hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
Date Thu, 17 Oct 2013 21:16:50 GMT

    [ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798432#comment-13798432
] 

Jason Lowe commented on YARN-415:
---------------------------------

Thanks for the update Andrey.  This change should resolve my concerns about the running container
leaks, but there are some points that need to be addressed with respect to logging in cleanupRunningContainers:

* If an application with many containers in-flight simply unregisters and exits expecting
the RM to clean up the mess or the application simply crashes, we're going to log a lot of
messages for all those containers.  Currently the RM kills all current containers of an application
already, so we're talking about being incorrect on the order of a few milliseconds for a sane
RM.  I think this should be an INFO rather than a WARN.  Also we probably want to log a single
message per application, stating how many containers were affected rather than specific ones,
since we don't currently expose container-specific metrics anyway.
* There's a "new memSec" log message that appears to be a debugging artifact that was left
in the patch

> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch,
YARN-415--n6.patch, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to get the memory
utilization of an application.  The unit should be MB-seconds or something similar and, from
a chargeback perspective, the memory amount should be the memory reserved for the application,
as even if the app didn't use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime
of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear
on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web Services REST API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message