Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8B04B11DEA for ; Mon, 18 Aug 2014 03:50:21 +0000 (UTC) Received: (qmail 61562 invoked by uid 500); 18 Aug 2014 03:50:21 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 61518 invoked by uid 500); 18 Aug 2014 03:50:21 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 61506 invoked by uid 99); 18 Aug 2014 03:50:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Aug 2014 03:50:21 +0000 Date: Mon, 18 Aug 2014 03:50:21 +0000 (UTC) From: "Jian He (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100253#comment-14100253 ] Jian He commented on YARN-415: ------------------------------ bq. an attempt can be in the complete state before all of its containers are finished CapacityScheduler#doneApplicationAttempt(FairScheduler#removeApplicationAttempt) are synchronously finishing all the live containers. so I think all containers should be guaranteed to finish before attempt finishes. bq. charging the running containers to the current app until the containers finish will be seemless to the end user. Particularly in work-preserving AM restart, current AM is actually the one who's managing previous running containers. Running containers in scheduler are already transferred to the current AM. So running containers metrics are transferred as well. I think it'll be confusing if finished containers are still charged back against the previous dead attempt. Btw, YARN-1809 will add the attempt web page where we could show attempt-specific metrics also. Regarding the problem of metrics persistency. Agree that it doesn't solve the problem for running apps in general. Maybe we can have the state store changes in a separate jira and discuss more there, so that we can get this in first. > Capture memory utilization at the app-level for chargeback > ---------------------------------------------------------- > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager > Affects Versions: 0.23.6 > Reporter: Kendall Thrapp > Assignee: Andrey Klochkov > Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.201408150030.txt, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)