hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
Date Thu, 24 Jul 2014 02:03:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072696#comment-14072696
] 

Wangda Tan commented on YARN-415:
---------------------------------

Hi Eric,
Thanks for updating your patch, I think now don't have major comments, 

*Following are some minor comments:*
1) RMAppAttemptImpl.java
1.1 There're some irrelevant line changes in RMAppAttemptImpl, could you please revert them?
Like
{code}
           RMAppAttemptEventType.RECOVER, new AttemptRecoveredTransition())
-          
+
{code}

1.2 getResourceUtilization:
{code}
+    if (rmApps != null) {
+      RMApp app = rmApps.get(attemptId.getApplicationId());
+      if (app != null) {
{code}
I think the two cannot happen, we don't need check null to avoid potential bug here

{code}
+          ApplicationResourceUsageReport appResUsageRpt =
{code}
It's better to name it appResUsageReport since rpt is not a common abbr of report.

2) RMContainerImpl.java
2.1 updateAttemptMetrics:
{code}
      if (rmApps != null) {
        RMApp rmApp = 
            rmApps.get(container.getApplicationAttemptId().getApplicationId());
        if (rmApp != null) {
{code}
Again, I think the two null check is unnecessary

3) SchedulerApplicationAttempt.java
3.1 Some rename suggestions: (Please let me know if you have better idea)
CACHE_MILLI -> MEMORY_UTILIZATION_CACHE_MILLISECONDS
lastTime -> lastMemoryUtilizationUpdateTime
cachedMemorySeconds -> lastMemorySeconds
same for cachedVCore ...

4) AppBlock.java
Should we rename "Resource Seconds:" to "Resource Utilization" or something?

5) Test
5.1 I'm wondering if we need add a end to end test, since we changed RMAppAttempt/RMContainerImpl/SchedulerApplicationAttempt.
It can consist submit an application, launch several containers, and finish application. And
it's better to make the launched application contains several application attempt.
While the application running, there're muliple containers running, and multiple containers
finished. We can check if total resource utilization are expected.

*To your comments:*
1) 
bq. One thing I did notice when these values are cached is that there is a race where containers
can get counted twice:
I think this can not be avoid, it should be a transient state and Jian He and I discussed
about this long time before.
But apparently, 3 sec cache make it not only a transient state. I suggest you can make "lastTime"
in SchedulerApplicationAttempt protected. And in FiCaSchedulerApp/FSSchedulerApp, when remove
container from liveContainer (in completedContainer method). You can set lastTime to a negtive
value like -1, and next time when trying to get accumulated resource utilization, it will
recompute all container utilization.

2)
bq. I am a little reluctant to modify the type of SchedulerApplicationAttempt#liveContainers
as part of this JIRA. That seems like something that could be done separately.
I think that will be fine :), because current getRunningResourceUtilization is called by getResourceUsageReport.
And getResourceUsageReport is synchronized, no matter we changed liveContainers to concurrent
map or not, we cannot solve the locking problem. 
I agree to enhance it in a separated JIRA in the future.

Thanks,
Wangda


> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch,
YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch,
YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt,
YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt,
YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to get the memory
utilization of an application.  The unit should be MB-seconds or something similar and, from
a chargeback perspective, the memory amount should be the memory reserved for the application,
as even if the app didn't use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime
of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear
on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message