hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
Date Fri, 18 Jul 2014 07:56:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066145#comment-14066145

Wangda Tan commented on YARN-415:

Hi [~eepayne],
I've spent some time to review and think about the JIRA. I have a 

1. Revert changes of SchedulerAppReport, we already have changed ApplicationResourceUsageReport,
and memory utilization should be a part of resource usage report.

2. Remove getMemory(VCore)Seconds from RMAppAttempt, modify RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds
to return completed+running resource utilization.

3. put
            String.format("%d MB-seconds, %d vcore-seconds", 
                app.getMemorySeconds(), app.getVcoreSeconds()))
from "Application Overview" to "Application Metrics", and rename it to "Resource Seconds".
It should be considered as a part of application metrics instead of overview.

4. Change finishedMemory/VCoreSeconds to AtomicLong in RMAppAttemptMetrics to make it can
be efficiently accessed by multi-thread.

5. I think it's better to add a new method in SchedulerApplicationAttempt like getMemoryUtilization,
which will only return memory/cpu seconds. We do this to prevent locking scheduling thread
when showing application metrics on web UI.
getMemoryUtilization will be used by RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to
return completed+running resource utilization. And used by SchedulerApplicationAttempt#getResourceUsageReport
as well.

The MemoryUtilization class may contain two fields: runningContainerMemory(VCore)Seconds.

6. Since compute running container resource utilization is not O(1), we need scan all containers
under an application. I think it's better to cache a previous compute result, and it will
be recomputed after several seconds (maybe 1-3 seconds should be enough) elapsed.

And you can modify SchedulerApplicationAttempt#liveContainers to be a ConcurrentHashMap. With
#6, get memory utilization to show metrics on web UI will not lock scheduling thread at all.

Please let me know if you have any comments here,


> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch,
YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch,
YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt,
YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.patch
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to get the memory
utilization of an application.  The unit should be MB-seconds or something similar and, from
a chargeback perspective, the memory amount should be the memory reserved for the application,
as even if the app didn't use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime
of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear
on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web Services REST API.

This message was sent by Atlassian JIRA

View raw message