hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
Date Mon, 07 Jul 2014 01:11:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053278#comment-14053278

Wangda Tan commented on YARN-415:

Hi Eric,
Thanks for the patch, I think it's very important for YARN. A patch of mine YARN-2181 has
similar prospect (tracking preemption info of app attempt), but has different approach. I
just looked at your patch, some comments,

YARN-415 needs get two parts of info, one is from AppSchedulingInfo, for running containers.
Another is from SchedulerApplicationAttempt/RMAppAttempt, for completed containers.

YARN-2181 doesn't need running containers info, and YARN-2181 puts most logic in RMAppAttempt.
To make YARN-2181/YARN-415 more consistent, we have two choice here: one is move major logic
of YARN-2181 to SchedulerApplicationAttempt and copy them back to RMAppAttempt as payloads
of RMAppAttemptAppFinishedEvent like what YARN-415 did. Another is move completed container
resource usage calculation to RMAppAttempt like what YARN-2181 did. 

Personally, I perfer the latter one because:
1) We don't need store completed-container-resource-usage info in two places (SchedulerApplication
and RMAppAttempt).
2) We don't need add extra complexity to SchedulerApplicationAttempt/AppSchedulingInfo which
should more focus on scheduling based stuffs.
3) Putting calculation of resource usage of completed container in RMAppAttempt#ContainerFinishedTransition
should be straightforward, and doesn't need access any fields in AppSchedulingInfo/SchedulerAppAttempt.
Accessing fields in scheduler from a different thread will block scheduler thread, which seems
not good to me. We can't let an UI-based requirement block scheduler making decision.

And for showing metrics on UI, YARN-415 uses the route: SchedulerAppReport->ApplicationReport->AppInfo,
I found AppInfo.java:
+        ApplicationReport report = 
+            app.createAndGetApplicationReport(null, hasAccess);
+        ApplicationResourceUsageReport usageReport = 
+            report.getApplicationResourceUsageReport();
+        this.memorySeconds = usageReport.getMemorySeconds();
+        this.vcoreSeconds = usageReport.getVcoreSeconds();
You can see AppInfo get a whole ApplicationReport from RMApp and only uses two fields, which
is a waste.
I would suggest to add resource usage fields to AppMetrics/AppAttemptMetrics, and AppInfo
can leverage them to render UI like what YARN-2181 did.

And some other reviews about YARN-415,
1) It's better to add changes to SchedulerApplication instead of to add them AppSchedulingInfo.
AppSchedulingInfo is a critical class for schedulers make decision, we should carefully put
any fields for other purpose besides scheduling.
2) We don't need create a ResourceUsage to track container resource/startTime, RMContainer
already has fields:
+  private static class ResourceUsage {
+    private final Resource resource;
+    private final long startTimeMillis;
3) Addtional lock or concurrent data structure should be used when we need access fields in
scheduler (like AppSchedulingInfo/SchedulerApplication) in another thread. There're some thread-safe
problem like using HashMap to store running containers resource usage in AppSchedulingInfo.java:
+  private final Map<ContainerId, ResourceUsage> runningContainersUsage = 
+      new HashMap<ContainerId, ResourceUsage>();

Does these comments make sense to you? Please feel free to let me know if you have any comments.


> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch,
YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch,
YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt,
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to get the memory
utilization of an application.  The unit should be MB-seconds or something similar and, from
a chargeback perspective, the memory amount should be the memory reserved for the application,
as even if the app didn't use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime
of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear
on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web Services REST API.

This message was sent by Atlassian JIRA

View raw message