Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Wed, 8 Apr 2015 16:56:13 +0000 (UTC)
From: "Zhijie Shen (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12766989.1421110272000.28461.1428512173705@Atlassian.JIRA>
In-Reply-To: <JIRA.12766989.1421110272000@Atlassian.JIRA>
References: <JIRA.12766989.1421110272000@Atlassian.JIRA>
 <JIRA.12766989.1421110272111@arcas>
Subject: [jira] [Commented] (YARN-3044) [Event producers] Implement RM
 writing app lifecycle events to ATS
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/YARN-3044?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14485=
527#comment-14485527 ]=20

Zhijie Shen commented on YARN-3044:
-----------------------------------

Before screening the patch details, I have some high level comments:

bq. IIUC you meant we will have RMContainerEntity having type as YARN_RM_CO=
NTAINER and NMContainerEntity having type as YARN_NM_CONTAINER right ?

Can we use ContainerEntity. The events from RM are RM_XXXX_EVENT, and those=
 from NM are NM_XXXX_EVENT.

bq. I'm very much concerned about the volume of writes that the RM collecto=
r would need to do,
bq. I fully understand the concern from Sangjin Lee that RM may not afford =
tens of thousands containers in large size cluster.

I also think publishing all container lifecycle events from NM is likely to=
 be a big cost in total, but I'd like to provide some point from other poin=
t of view. Say we have a big cluster that can afford 5,000 concurrent conta=
iners. RM have to maintain the lifecycle of these 5K containers, and I don'=
t think a less powerful server can manage it, right? Assume we have such a =
powerful server to run the RM of a big cluster, will publishing lifecycle e=
vents be a big deal to the server? I'm not sure, but I can provide some hin=
ts. Now each container will write 2 events per lifecycle,  and perhaps in t=
he future we want to record each state transition, and result in ~10 events=
 per lifecycle. Therefore, we have 10 * 5K  lifecycle events, and they won'=
t be written at the same moment because containers' lifecycles are usually =
async. Let's assume each container run for 1h  and lifecycle events are uni=
formly distributed, in each second, there will just be around 14 concurrent=
 writes (for a powerful server).

I think we may overestimate the performance impact of writing NM lifecycles=
. Perhaps a more reasonable performance metric is {{cost of writing lifecyc=
le events per container / cost of managing lifecycle per container * 100%}}=
. For example, if it is 2%, I guess it will probably be acceptable.

bq. all configs will not be set as part of this so was there more planned f=
or this from the framework side or each application needs to take care of t=
his on their own to populate configuration information ?
bq. In that sense, how about letting frameworks (namely AMs) write the conf=
iguration instead of RM?

I'm not sure if I understand this part correctly, but I incline that system=
 timeline data (RM/NM) is controlled by cluster config and per cluster, whi=
le application data is controlled by framework or even per-application conf=
ig. It may have some problem if the user is able to change the former confi=
g. For example, he can hide its application information from cluster admin.

bq. I have also incorporated the changes to support RMContainer metrics bas=
ed on configuration (Junping's comments).

Do you mean we should keep {{yarn.resourcemanager.system-metrics-publisher.=
enabled}} to control RM SMP, and and create {{yarn.nodemanager.system-metri=
cs-publisher.enabled}} to control NM SMP?


> [Event producers] Implement RM writing app lifecycle events to ATS
> ------------------------------------------------------------------
>
>                 Key: YARN-3044
>                 URL: https://issues.apache.org/jira/browse/YARN-3044
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3044.20150325-1.patch, YARN-3044.20150406-1.pat=
ch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS=
.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)