Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F03E417C20 for ; Wed, 8 Apr 2015 16:56:13 +0000 (UTC) Received: (qmail 90587 invoked by uid 500); 8 Apr 2015 16:56:13 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 90542 invoked by uid 500); 8 Apr 2015 16:56:13 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 90531 invoked by uid 99); 8 Apr 2015 16:56:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2015 16:56:13 +0000 Date: Wed, 8 Apr 2015 16:56:13 +0000 (UTC) From: "Zhijie Shen (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3044?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14485= 527#comment-14485527 ]=20 Zhijie Shen commented on YARN-3044: ----------------------------------- Before screening the patch details, I have some high level comments: bq. IIUC you meant we will have RMContainerEntity having type as YARN_RM_CO= NTAINER and NMContainerEntity having type as YARN_NM_CONTAINER right ? Can we use ContainerEntity. The events from RM are RM_XXXX_EVENT, and those= from NM are NM_XXXX_EVENT. bq. I'm very much concerned about the volume of writes that the RM collecto= r would need to do, bq. I fully understand the concern from Sangjin Lee that RM may not afford = tens of thousands containers in large size cluster. I also think publishing all container lifecycle events from NM is likely to= be a big cost in total, but I'd like to provide some point from other poin= t of view. Say we have a big cluster that can afford 5,000 concurrent conta= iners. RM have to maintain the lifecycle of these 5K containers, and I don'= t think a less powerful server can manage it, right? Assume we have such a = powerful server to run the RM of a big cluster, will publishing lifecycle e= vents be a big deal to the server? I'm not sure, but I can provide some hin= ts. Now each container will write 2 events per lifecycle, and perhaps in t= he future we want to record each state transition, and result in ~10 events= per lifecycle. Therefore, we have 10 * 5K lifecycle events, and they won'= t be written at the same moment because containers' lifecycles are usually = async. Let's assume each container run for 1h and lifecycle events are uni= formly distributed, in each second, there will just be around 14 concurrent= writes (for a powerful server). I think we may overestimate the performance impact of writing NM lifecycles= . Perhaps a more reasonable performance metric is {{cost of writing lifecyc= le events per container / cost of managing lifecycle per container * 100%}}= . For example, if it is 2%, I guess it will probably be acceptable. bq. all configs will not be set as part of this so was there more planned f= or this from the framework side or each application needs to take care of t= his on their own to populate configuration information ? bq. In that sense, how about letting frameworks (namely AMs) write the conf= iguration instead of RM? I'm not sure if I understand this part correctly, but I incline that system= timeline data (RM/NM) is controlled by cluster config and per cluster, whi= le application data is controlled by framework or even per-application conf= ig. It may have some problem if the user is able to change the former confi= g. For example, he can hide its application information from cluster admin. bq. I have also incorporated the changes to support RMContainer metrics bas= ed on configuration (Junping's comments). Do you mean we should keep {{yarn.resourcemanager.system-metrics-publisher.= enabled}} to control RM SMP, and and create {{yarn.nodemanager.system-metri= cs-publisher.enabled}} to control NM SMP? > [Event producers] Implement RM writing app lifecycle events to ATS > ------------------------------------------------------------------ > > Key: YARN-3044 > URL: https://issues.apache.org/jira/browse/YARN-3044 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Naganarasimha G R > Attachments: YARN-3044.20150325-1.patch, YARN-3044.20150406-1.pat= ch > > > Per design in YARN-2928, implement RM writing app lifecycle events to ATS= . -- This message was sent by Atlassian JIRA (v6.3.4#6332)