Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Wed, 23 Jul 2014 23:12:41 +0000 (UTC)
From: "Eric Payne (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12633650.1361552234337.33572.1406157161030@arcas>
In-Reply-To: <JIRA.12633650.1361552234337@arcas>
References: <JIRA.12633650.1361552234337@arcas>
Subject: [jira] [Updated] (YARN-415) Capture memory utilization at the
 app-level for chargeback
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Payne updated YARN-415:
----------------------------

    Attachment: YARN-415.201407232237.txt

[~leftnoteasy], Thank you for your reply.

I have implemented the following changes with the current patch.

{quote}
1. Revert changes of SchedulerAppReport, we already have changed ApplicationResourceUsageReport, and memory utilization should be a part of resource usage report.
{quote}
Changes to SchedulerAppReport have been reverted.

{quote}
2. Remove getMemory(VCore)Seconds from RMAppAttempt, modify RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization.
{quote}
I have removed getters and setters from RMAppAttempt and added RMAppAttemptMetrics#getResourceUtilization, which returns a single ResourceUtilization instance that contains both memorySeconds and vcoreSeconds for the appAttempt. These include both finished and running statistics IF the appAttempt is ALSO the current attempt. If not, it only includes the finished statistics.

{quote}
3. put
{code}
         ._("Resources:",
            String.format("%d MB-seconds, %d vcore-seconds", 
                app.getMemorySeconds(), app.getVcoreSeconds()))
{code}
from "Application Overview" to "Application Metrics", and rename it to "Resource Seconds". It should be considered as a part of application metrics instead of overview.
{quote}
Changes completed.

{quote}
4. Change finishedMemory/VCoreSeconds to AtomicLong in RMAppAttemptMetrics to make it can be efficiently accessed by multi-thread.
{quote}
Changes completed.

{quote}
5. I think it's better to add a new method in SchedulerApplicationAttempt like getMemoryUtilization, which will only return memory/cpu seconds. We do this to prevent locking scheduling thread when showing application metrics on web UI.
 getMemoryUtilization will be used by RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization. And used by SchedulerApplicationAttempt#getResourceUsageReport as well.

The MemoryUtilization class may contain two fields: runningContainerMemory(VCore)Seconds.
{quote}
Added ResourceUtilization (instead of MemoryUtilization), but did not make the other changes as per comment:
https://issues.apache.org/jira/browse/YARN-415?focusedCommentId=14071181&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14071181

{quote}
6. Since compute running container resource utilization is not O(1), we need scan all containers under an application. I think it's better to cache a previous compute result, and it will be recomputed after several seconds (maybe 1-3 seconds should be enough) elapsed.
{quote}
I added chached values in SchedulerApplicationAttempt for memorySeconds and vcoreSeconds that are updated when 1) a request is received to calculate these metrics, AND 2) it has been more than 3 seconds since the last request.


One thing I did notice when these values are cached is that there is a race where containers can get counted twice:
- RMAppAttemptMetrics#getResourceUtilization sends a request to calculate running containers, and container X is almost finished. RMAppAttemptMetrics#getResourceUtilization adds the finished values to the running values and returns ResourceUtilization.
- Container X completes and its memorySeconds and vcoreSeconds are added to the finished values for appAttempt.
- RMAppAttemptMetrics#getResourceUtilization makes another request before the 3 second interval, and the cached values are added to the finished values for appAttempt.
Since both the cached values and the finished values contain metrics for Container X, those are double counted until 3 seconds elapses and the next RMAppAttemptMetrics#getResourceUtilization request is made.

{quote}
And you can modify SchedulerApplicationAttempt#liveContainers to be a ConcurrentHashMap. With #6, get memory utilization to show metrics on web UI will not lock scheduling thread at all.
{quote}
I am a little reluctant to modify the type of SchedulerApplicationAttempt#liveContainers as part of this JIRA. That seems like something that could be done separately.


> Capture memory utilization at the app-level for chargeback
> ----------------------------------------------------------
>
>                 Key: YARN-415
>                 URL: https://issues.apache.org/jira/browse/YARN-415
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 0.23.6
>            Reporter: Kendall Thrapp
>            Assignee: Andrey Klochkov
>         Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to get the memory utilization of an application.  The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web Services REST API.


--
This message was sent by Atlassian JIRA
(v6.2#6252)